zulip

mirror of https://github.com/zulip/zulip.git synced 2025-11-01 20:44:04 +00:00

Author	SHA1	Message	Date
Sai	eaa2ece4e1	slack: Fix bugs during import of thread messages. Consider the following messages JSON (taken from real data, redacted): ```json [ { "subtype": "bot_message", "text": "", "attachments": [ { "fallback": "Open Slack to cast your vote in this Simple Poll", "title": "Should we do a thing?", "id": 1, "color": "6ecadc", "fields": [ { "title": "", "value": "1️⃣ Yes 👍\n\n"", "short": false }, { "title": "", "value": "2️⃣ No 👎\n\n", "short": false }, { "title": "", "value": "3️⃣ Abstain :spock-hand:\n\n", "short": false } ], "mrkdwn_in": [ "fields" ] }, { "callback_id": "12345678-1234-1234-1234-123456789abc", "fallback": "Open Slack to cast your vote in this Simple Poll", "id": 2, "color": "6ecadc", "actions": [ { "id": "1", "name": "vote", "text": "1️⃣", "type": "button", "value": "1", "style": "" }, { "id": "2", "name": "vote", "text": "2️⃣", "type": "button", "value": "2", "style": "" }, { "id": "3", "name": "vote", "text": "3️⃣", "type": "button", "value": "3", "style": "" }, { "id": "4", "name": "delete-v2", "text": "Delete Poll", "type": "button", "value": "", "style": "danger", "confirm": { "text": "Are you sure you want to delete the Poll?", "title": "Delete Poll?", "ok_text": "Yes", "dismiss_text": "No" } } ] }, { "callback_id": "12345678-1234-1234-1234-123456789abc", "fallback": "Open Slack to cast your vote in this Simple Poll", "footer": "Simple Poll <https:\/\/simplepoll.rocks\/dashboard\/redacted\/settings\/\|Edit Settings>", "id": 3, "footer_icon": "https:\/\/simplepoll.rocks\/static\/main\/favicon.png", "color": "6ecadc" } ], "type": "message", "ts": "1234567890.123456", "bot_id": "B1ABCDEF1", "thread_ts": "1234567890.123456", "reply_count": 1, "reply_users_count": 1, "latest_reply": "1234567890.765432", "reply_users": [ "U1ABC1234" ], "replies": [ { "user": "U1ABC1234", "ts": "1234567890.765432" } ], "is_locked": false, "subscribed": false }, { "user": "U1ABC1234", "type": "message", "ts": "1234567890.765432", "text": "Maybe do qux instead", "team": "T1AB23456", "user_team": "T1AB23456", "source_team": "T1AB23456", "user_profile": { "avatar_hash": "a123456789ab", "image_72": "https:\/\/avatars.slack-edge.com\/2017-01-01\/123456789abc_def123456789abcdef12_72.jpg", "first_name": "Alice", "real_name": "Alice Smith", "display_name": "a.smith", "team": "T1AB23456", "name": "a.smith", "is_restricted": false, "is_ultra_restricted": false }, "thread_ts": "1234567890.123456", "blocks": [ { "type": "rich_text", "block_id": "EoBdt", "elements": [ { "type": "rich_text_section", "elements": [ { "type": "text", "text": "Maybe do qux instead" } ] } ] } ] } ] ``` `533f177175/zerver/data_import/slack.py (L922-L924)` fails for the first message, because it lacks a 'user' key. It should fall back to the bot_id. `533f177175/zerver/data_import/slack.py (L925-L926)` fails for the second message, because it lacks a 'parent_user_id' key. However, the thread root will have been processed earlier, so memoization of thread parents fixes this issue. Because the original message may not be in the same file, the memoization needs to be global rather than in `channel_message_to_zerver_message`.	2025-09-16 11:35:53 -07:00
Anders Kaseorg	e216317f88	test_mattermost_importer: Condition html2text crash on Python version. Signed-off-by: Anders Kaseorg <andersk@mit.edu>	2025-08-13 14:11:05 -07:00
Mateusz Mandera	df7bb3cdd4	slack: Handle float "ts" in attachment info. It's rare, but sometimes we see floats in this field.	2025-07-21 09:24:03 -07:00
Mateusz Mandera	342c00cc7b	data_import: Set a default announcements channel for 3rd party imports. For Slack, Mattermost and RocketChat, set the default announcement channels if the sane default channel is available in the import. The next commit will add a mechanism to choose an alternative default if the expected channel name is not present. Addresses the major part of #34984.	2025-07-18 17:45:21 -07:00
PieterCK	9de1b47ffa	slack_import: Improve topic name for Slack threads. This updates the topic name format for Slack threads to include a snippet of the original message. Currently the format looks like this; "{date} Slack thread {n}", which provides little to no context about the thread. Currently we also use the `thread_ts_str` key to keep track of Slack threads we're converting, this key format makes it so that it has a small chance of combining threads with the same timestamp under one topic. This commit updates it to use a new key format, `thread_key`. Fixes #27661.	2025-07-16 14:32:05 -07:00
PieterCK	580845415b	slack_data_import: Refactor `get_user_mentions`. This commit extracts a portion of logic from `get_user_mentions` into a new helper function for reusability. This is a prep commit for #30166.	2025-07-15 15:49:05 -07:00
Mateusz Mandera	a43bdcd166	import: Fix is_realm_imported_from_other_zulip_server logic. This logic was fundamentally broken: ``` is_realm_imported_from_other_zulip_server = RealmAuditLog.objects.filter( realm=realm, event_type=AuditLogEventType.REALM_EXPORTED ).exists() if not is_realm_imported_from_other_zulip_server: send_zulip_update_announcements_to_realm( realm, skip_delay=False, realm_imported_from_other_product=True ) ``` Because the `REALM_EXPORTED` was only created after the export completed - meaning it couldn't be included in the export data at all. Thus considering exports to be "not from Zulip" incorrectly. We get around this issue by explicitly including an import_source in the realm dict in the export data from 3rd party apps. The importer can then rely on this value to determine if it's dealing with a Zulip-originated export or not.	2025-07-14 15:11:43 -07:00
Sahil Batra	b655bd14ea	messages: Use "\x07" as topic for DMs and group DMs. This commit updates code to use "\x07" as value for "subject" field of Message objects for DMs and group DMs, so that we have a unique value for DMs and group DMs which cannot be used for channel messages. This helps in avoiding having an empty string value as topic for DMs, which is also used for "general chat" channel messages, as large number of DMs in the realm resulted in PostgreSQL query planner thinking that there are too many "general chat" messages and thus generated bad query plans for operations like fetching "general chat" messages in a stream or moving messages to and from "general chat" topic. This change as done for ArchivedMessage and ScheduledMessage objects as well. Note that the clients still get "subject" value as an empty string "". This commit also adds tests for checking that "\x07" cannot be used as topic for channel messages. Fixes #34360.	2025-06-19 10:44:37 -07:00
Mateusz Mandera	0d17f34ff5	slack_import: Import Slackbot as a bot instead of a user. Slackbot should be correctly marked as a bot rather than a regular user - otherwise, it confusingly appears on the Users list and in the realm creation from Slack import flow, in the dropdown for choosing the organization owner account. Closes #34649	2025-06-18 16:55:31 -07:00
Aman Agrawal	4cca5652e3	slack_import: Pipe file processing error message to the user. When the slack import fails due to invalid zip file being uploaded, we take user back to the file upload page with an appropriate error message.	2025-06-16 10:46:25 -07:00
Aman Agrawal	b57b783dd8	slack: Don't show error code to users. We log the error internally and only show invalid token as the error message.	2025-05-28 17:18:07 -07:00
Aman Agrawal	68372f8e03	slack: Change invalid token error message.	2025-05-28 17:18:07 -07:00
PieterCK	0dfb709152	slack_data_import: Support converting integration bot messages. Integration bot messages in Slack may include "blocks" and "attachments," which are Slack's messaging features. Currently, these messages aren't processed when converting Slack export data. This commit adds support for converting integration bot messages, as well as other Slack messages containing "blocks" and "attachments". Message payload with the block type `rich_text` is skipped because all messages sent by users have this format. Fixes #31162. [1]=https://docs.slack.dev/reference/block-kit/blocks/rich-text-block/	2025-05-23 14:27:31 -07:00
PieterCK	201a71b575	mattermost_import: Except error when converting messages HTML. This adds a try-except block when running html2text when processing raw messages from HTML to markdown. convert_html_to_text is added mainly for testing convinience. We don't have any sample of Mattermosts' problematic content that could trigger this sort of error yet, so the test mocks convert_html_to_text to raise error instead.	2025-05-21 16:08:30 -07:00
Mateusz Mandera	a52bc4d71b	slack: Handle integration bots with missing data. We encountered the following two new cases with integration bots in Slack imports: 1. Bots without the image_72 field in their data. Such bots should fall back to gravatar. 2. Bots whose bot_id is the sender of certain messages, but querying the bots.info endpoint returns bot_not_found error. We should create dummy accounts in place of such bots.	2025-05-16 13:06:28 -07:00
Mateusz Mandera	61b6e823b6	slack: Don't create duplicate reactions. Just like in the prior commit we add similar code in `get_subscription`, here we do this for reactions. If two Slack accounts, which will be merged into one Zulip account, posted the same emoji reaction to a message, we need to avoid duplication.	2025-05-16 13:06:28 -07:00
Mateusz Mandera	86f17026ac	slack: Don't create duplicate subscriptions. In `1517601e9d` we added merging of accounts with the same email address; but didn't prevent duplication of subscriptions when two Slack accounts, merged into one Zulip account, belong to a channel.	2025-05-16 13:06:28 -07:00
userAdityaa	354a16fb0a	migration: Rename 'populate_db' Client to 'ZulipDataImport'. This commit: * Creates a migration to rename any existing Client with name="populate_db" to "ZulipDataImport". * Updates populate_db.py to use ZulipDataImport for new message creation These changes should make code to identify imported messages considerably more readable. Fixes #33909.	2025-05-08 12:18:34 -07:00
PieterCK	a228699108	slack_incoming: Use Slack text reformatters from data import. This commit updates the Slack incoming webhook endpoint to use the same Slack reformatting functions (`convert_to_zulip_markdown` and `replace_links`) that are used for Slack data import and cleaning up any duplicative functions. This was previously not possible because the Slack reformatting regex in `slack_message_conversion.py` could not handle these cases: - Formatting applied to non-ASCII characters (e.g., emoji). - Formatted strings separated by exactly one character. - Formatted strings appearing immediately after a new line. Fixes part of #31162.	2025-04-28 20:49:10 -07:00
PieterCK	c018911c5b	slack_regex: Fix overlapping capture groups. The Slack text regexes match specific characters before and after the formatted string to ensure that they only match at word boundaries. However, because the capture groups consume each matching character, including the characters used to determine word boundaries, two formatted strings separated by a single matching character result in one string not being matched, as the trailing whitespace is already consumed and cannot also match as the leading pre-match whitespace for the next character. Switch to a look-ahead regex for the trailing word boundary characters. This is zero-width, and as such the next match can still also consume the same characters. This also fixes Slack webhook integrations' `test_message_with_complex_formatted_mentions` which was previously expecting false output. Fixes part of #30827.	2025-04-28 20:49:10 -07:00
PieterCK	cfeb4ba731	slack_regex: Simplify Slack regex main capture group. The inner capture group of Slack text regex is used to capture the formatted text, so basically all characters but the formatting characters like , ~, and _. It currently does this by specifying a range of characters to be captured except the formatting characters. This unintentionally excluded non-ASCII characters like emoji. This commit simplifies the inner capture groups of the Slack text regex to explicitly exclude formatting characters (e.g., , ~, _) instead of using an allowlist to not include them. This change is part of the effort to make `convert_to_zulip_markdown` compatible with output from `render_blocks` and `render_attachments`, which may contain emoji.	2025-04-28 20:49:10 -07:00
PieterCK	ee34ccb8c2	slack_regex: Remove unnecessary inner capture group. The inner capture groups of the Slack text regex tries to captures the formatted string. For an unclear reason, we're currently using two similar capture groups despite the fact that only using the second capture groups would suffice. This removes the first capture groups.	2025-04-28 20:49:10 -07:00
PieterCK	8992435caf	slack_regex: Update Slack regex handle multiline strings. This prep commit modifies the Slack text regex in `slack_message_conversion.py` to use the `re.MULTILINE` flag capture formatted strings that are adjacent to newline or end of line. These kinds of strings are likely not uncommon to be generated by Slack exporter itself and our Slack message block and attachment formatters (`render_blocks` and `render_attachments`) also produces them. This also fixes Slack webhook integration's `test_message_with_complex_formatted_texts` which was previously expecting false output. Fixes part of #30827.	2025-04-28 20:49:10 -07:00
PieterCK	6c1818fa46	slack_regex: Avoid replacing all identical matches. In `convert_markdown_syntax`, `re.finditer` is used to iterate over each `match` found in the `text`. In each iteration, we create the Zulip formatting equivalent of the current `match` and do a `text.replace()` to replace all strings in `text` that matches the current `match`. Since we're planning to add the `re.MULTILINE` flag to `re.finditer`, it would be problematic if the first and last capture group of `match` are zero-width characters (newlines). This is because `convert_markdown_syntax` would reformat all identical strike-through and bold formatted strings as many time as there are identical `match`es of them. Consider the following: --- original text: "foo\nfoo" 1. <re.Match object; span=(0, 5), match='foo'> 1st replace -> "foo\nfoo" 2. <re.Match object; span=(6, 11), match='foo'> 2nd replace -> "*foo\nfoo*" --- This commit uses `re.sub` instead of `re.finditer`, which does replace operation only on the specific piece of the original text the `match` is from.	2025-04-28 20:49:10 -07:00
PieterCK	946e4096fc	slack_regex: Refactor how the capture groups are defined. This updates the patterns to use a more unicode-aware pattern for their first and last capture groups. The new patterns have the same behaviours, but they're expressed in a more coherent way. For example, the existing patterns lists what characters to look for, skipping ceratin characters it don't want to match (e.g, closing brackets & quote are skipped). The new pattern narrows down what it looks for (whitespace, punctuation, symbols) and explicitly list what it don't want to match (closing quote and bracket, etc). This also refactors the `convert_markdown_syntax` to use the `regex` module instead of the `re` module because the `regex` module has full unicode support.	2025-04-28 20:49:10 -07:00
PieterCK	f33c5b9c1e	slack_regex: Fix wrong comments.	2025-04-28 20:27:28 -07:00
PieterCK	d5e28bcd28	slack_import: Fix thread conversion condition. Currently, threads in Slack direct messages will increment the `thread_counter` variable inside the thread conversion logic. Since we don't treat thread messages in Slack DMs differently than any other DM, threads in DM will only falsely increment the thread topic names in channels. This adds a condition that checks if the Slack message is a DM or not before executing the thread conversion logic.	2025-03-25 16:38:21 -07:00
Elsa Kihlberg Gawell	845f0d40e1	import_data: Make sure converted DMs don't have topic name. Previously, `build_message` sets a message's topic name to the given topic name, regardless of whether the message was a direct message (DM) or a group direct message (GDM). This change adds the `is_private` parameter to `build_message`. If `is_private` is `True`, the `topic_name` will be overridden to an empty string (""). Consequently, this also updates the third-party importers to pass this parameter when calling `build_message`. Co-authored-by: Pieter CK <pieterceka123@gmail.com>	2025-03-25 16:38:21 -07:00
Alex Vandiver	47d55c4b6f	messages: Add an is_channel_message flag.	2025-03-18 09:34:11 -07:00
Anders Kaseorg	88a8087243	ruff: Fix PYI019 Use `Self` instead of custom TypeVar. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2025-03-14 17:31:50 -07:00
Tim Abbott	b249830715	rocketchat: Harden parsing of custom emoji exports. Apparently some exports have a weird `ObjectId("idstring")` typed value here, that stringify to the actual string.	2025-02-05 13:41:19 -08:00
Alex Vandiver	9b4b53ef29	slack: Mark content-type of imported attachments.	2025-01-31 14:29:57 -08:00
Alex Vandiver	33539568ae	slack: Ensure a newline before attachment links. The content `look![image](https://example.com)` does not render as a link, nor an image upload (were it to `/user_uploads/...`). The `![...](...)` syntax is intended for inline images, but unsupported in Zulip, and as such does not link or render as _anything_. Ensure a newline between message content and any attachments.	2025-01-31 14:29:57 -08:00
Mateusz Mandera	f81e514d07	slack: Fetch workspace users from /users.list in the correct manner. 1. Fetching from the `/users.list` endpoint is supposed to use pagination. Slack will return at most 1000 results in a single request. This means that our Slack import system hasn't worked properly for workspaces with more than 1000 users. Users after the first 1000 would be considered by our tool as mirror dummies and thus created with is_active=False,is_mirror_dummy=True. Ref https://api.slack.com/methods/users.list 2. Workspaces with a lot of users, and therefore requiring the use of paginated requests to fetch them all, might also get us to run into Slack's rate limits, since we'll be doing repeating requests to the endpoint. Therefore, the API fetch needs to also handle rate limiting errors correctly. Per, https://api.slack.com/apis/rate-limits#headers, we can just read the retry-after header from the rsponse and wait the indicated number of seconds before repeating the requests. This is an easy approach to implement, so that's what we go with here.	2025-01-24 16:41:53 -08:00
Anders Kaseorg	653b0b0436	ruff: Partially reformat Python with Ruff 0.9 (2025 style). These are the changes that are backwards compatible with the 2024 style. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2025-01-14 09:42:16 -08:00
Alex Vandiver	9fa5ab951c	mime_types: Move INLINE_MIME_TYPES to prevent future import loops.	2025-01-09 09:09:13 -08:00
Sahil Batra	b4b571d60c	streams: Remove stream_post_policy field from Stream.	2025-01-07 12:17:37 -08:00
Anders Kaseorg	19b8cde27f	ruff: Fix PLC0206 Extracting value from dictionary without calling `.items()`. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-12-21 21:06:53 -08:00
PieterCK	a746be807f	slack_import: Make `check_token_access` more flexible. Previously, the `check_token_access` function had a hardcoded `required_parameters` variable because it was only used in the Slack data importer. This commit refactors `required_parameters` into a function parameter, enabling the function to check a Slack token’s scope for other purposes, such as Slack webhook integration. Additionally, this commit changes the Slack API call in `check_token_access` from `teams.info` to `api.test`. The endpoint is better suited for this purpose since we're only checking a token’s scope using the response header here.	2024-12-18 16:11:31 -08:00
PieterCK	f988412394	slack_data_import: Support converting integration bot users. Currently, we're unable to convert messages from Slack's integration bots because this message subtype doesn't come from a Slack "user", that is they don't have a Slack user profile. This is a preparatory change to support converting Slack's integration bot messages. This commit artificially creates Slack user data from the integration bot's "profile" so that we can create a corresponding Zulip user for them. Part of #31311.	2024-12-16 13:09:57 -08:00
PieterCK	10946caa3d	slack_data_import: Update how Slack user avatars are processed. Previously, the Slack export converter can only process Slack's avatar URL from Slack's "ca.slack-edge.com" server, which looks like this: https://ca.slack-edge.com/T0CDRA6HM3P-U06NABE26M9-1173e04f818e-512 This commit adds support for converting any public downloadable image URLs. This is done to support importing Slack's integration bots and their messages, which typically have PNG type file url: https://avatars.slack-edge.com/2024-05-01/7057208497908_a4351f6deb91094eac4c_72.png	2024-12-16 13:09:57 -08:00
PieterCK	6543dadd0f	slack_import: Refactor a `convert_slack_formatting`. This prep commit extracts the logic for converting Slack formatted text into Zulip friendly markdown from `convert_to_zulip_markdown` into a new helper function, `convert_slack_formatting`. This is done to make reformating logic be reusable else where such as in the Slack webhook or Slack incoming webhook.	2024-12-06 13:11:44 -08:00
PieterCK	4b472611d0	slack_import: Refactor a `convert_slack_workspace_mentions`. This prep commit extracts the logic for reformatting Slack mentions into Zulip mentions from `convert_to_zulip_markdown` into a new helper function, `convert_slack_workspace_mentions`. This is done to make the reformatting logic be reusable else where such as in the Slack webhook or Slack incoming webhook.	2024-12-06 13:11:44 -08:00
PieterCK	ff4d5daef4	slack_import: Reorder functions used in `convert_to_zulip_markdown`. This prep commit moves `convert_to_zulip_markdown` and `convert_markdown_syntax` so that they appear after the various helper functions used in the main logic for converting Slack messages to Zulip's formatting. This adjustment aligns with code style conventions, improving readability. The change also prepares the codebase for reusing Slack text reformatting logic in the Slack incoming webhook.	2024-12-06 13:11:44 -08:00
PieterCK	0d7199b22e	data_import: Add migration status file to converted exports. This commit updates all third-party importer tools (Slack, Mattermost, and Rocket Chat) in the `zerver/data_import` directory to also output a migration_status.json file in their output tarball. This is required because all importable tarball will be checked for migration compatibility during import. Fixes #28443.	2024-11-08 15:52:45 -08:00
Mateusz Mandera	420849ff6a	slack: Call the correct resize_* function when importing realm icon. For resizing the icon.png files, we use resize_avatar, not resize_logo. This is pretty confusing - sure, for icons we use the same function as for avatars, but we should have a proper name for the function called in the icon context. So this commit also adds resize_realm_icon, and changes the calls to resize_avatar in icon contexts to resize_realm_icon.	2024-11-08 15:43:18 -08:00
Harsh	e468818d2b	import: Remove skipping of too-long messages during import. This commit eliminates the skipping of messages longer than 10K characters during the import process.	2024-11-07 16:04:14 -08:00
PieterCK	92437b4ab5	integrations: Refactor slack_incoming webhook. This commit refactors `render_attachment` and `render_block` out of slack_incoming.py to promote reusability. The primary motivation for this refactor is to add support for converting integration bots messages in Slack exports, which could use the same functions. Part of #31311.	2024-11-07 10:47:08 -08:00
Harsh	858fdeee39	import: Add function to normalize messages to import. This adds `normalize_body_for_import` to normalize messages from third-party importers by removing NUL bytes and also updates import test files data to test this. Fixes #31930.	2024-11-04 12:22:30 -08:00
Anders Kaseorg	42e1517255	email_notifications: Prevent html2text from mangling Unicode. html2text mangles Unicode by default, with a --unicode-snob option to disable it. If I have to get called a “snob” for wanting to correctly support non-English languages, then uh, I’ll take one for the team. https://github.com/Alir3z4/html2text/blob/2024.2.26/html2text/config.py#L111-L150 Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-10-30 09:49:23 -07:00

1 2 3 4 5 ...

399 Commits