For Slack, Mattermost and RocketChat, set the default announcement
channels if the sane default channel is available in the import.
The next commit will add a mechanism to choose an alternative default if
the expected channel name is not present.
Addresses the major part of #34984.
This updates the topic name format for Slack threads to include a
snippet of the original message. Currently the format looks like this;
"{date} Slack thread {n}", which provides little to no context about the
thread.
Currently we also use the `thread_ts_str` key to keep track of Slack
threads we're converting, this key format makes it so that it has a
small chance of combining threads with the same timestamp under one
topic. This commit updates it to use a new key format, `thread_key`.
Fixes#27661.
This logic was fundamentally broken:
```
is_realm_imported_from_other_zulip_server = RealmAuditLog.objects.filter(
realm=realm, event_type=AuditLogEventType.REALM_EXPORTED
).exists()
if not is_realm_imported_from_other_zulip_server:
send_zulip_update_announcements_to_realm(
realm, skip_delay=False, realm_imported_from_other_product=True
)
```
Because the `REALM_EXPORTED` was only created after the export completed
- meaning it couldn't be included in the export data at all. Thus
considering exports to be "not from Zulip" incorrectly.
We get around this issue by explicitly including an import_source in the
realm dict in the export data from 3rd party apps. The importer can then
rely on this value to determine if it's dealing with a Zulip-originated
export or not.
This commit updates code to use "\x07" as value for
"subject" field of Message objects for DMs and group
DMs, so that we have a unique value for DMs and group
DMs which cannot be used for channel messages.
This helps in avoiding having an empty string value as
topic for DMs, which is also used for "general chat"
channel messages, as large number of DMs in the realm
resulted in PostgreSQL query planner thinking that there
are too many "general chat" messages and thus generated
bad query plans for operations like fetching
"general chat" messages in a stream or moving messages
to and from "general chat" topic.
This change as done for ArchivedMessage and
ScheduledMessage objects as well.
Note that the clients still get "subject" value as
an empty string "".
This commit also adds tests for checking that "\x07"
cannot be used as topic for channel messages.
Fixes#34360.
Slackbot should be correctly marked as a bot rather than a regular user
- otherwise, it confusingly appears on the Users list and in the realm
creation from Slack import flow, in the dropdown for choosing the
organization owner account.
Closes#34649
Integration bot messages in Slack may include "blocks" and
"attachments," which are Slack's messaging features.
Currently, these messages aren't processed when converting
Slack export data.
This commit adds support for converting integration bot
messages, as well as other Slack messages containing "blocks"
and "attachments".
Message payload with the block type `rich_text` is skipped because all
messages sent by users have this format.
Fixes#31162.
[1]=https://docs.slack.dev/reference/block-kit/blocks/rich-text-block/
This adds a try-except block when running html2text when processing raw
messages from HTML to markdown.
convert_html_to_text is added mainly for testing convinience. We don't
have any sample of Mattermosts' problematic content that could trigger
this sort of error yet, so the test mocks convert_html_to_text to raise
error instead.
We encountered the following two new cases with integration bots in
Slack imports:
1. Bots without the image_72 field in their data. Such bots should fall
back to gravatar.
2. Bots whose bot_id is the sender of certain messages, but querying the
bots.info endpoint returns bot_not_found error. We should create
dummy accounts in place of such bots.
Just like in the prior commit we add similar code in `get_subscription`,
here we do this for reactions. If two Slack accounts, which will be
merged into one Zulip account, posted the same emoji reaction to a
message, we need to avoid duplication.
In 1517601e9d we added merging of accounts
with the same email address; but didn't prevent duplication of
subscriptions when two Slack accounts, merged into one Zulip account,
belong to a channel.
This commit:
* Creates a migration to rename any existing Client with
name="populate_db" to "ZulipDataImport".
* Updates populate_db.py to use ZulipDataImport for new
message creation
These changes should make code to identify imported messages
considerably more readable.
Fixes#33909.
This commit updates the Slack incoming webhook endpoint to use the same
Slack reformatting functions (`convert_to_zulip_markdown` and
`replace_links`) that are used for Slack data import and cleaning up any
duplicative functions.
This was previously not possible because the Slack reformatting regex in
`slack_message_conversion.py` could not handle these cases:
- Formatting applied to non-ASCII characters (e.g., emoji).
- Formatted strings separated by exactly one character.
- Formatted strings appearing immediately after a new line.
Fixes part of #31162.
The Slack text regexes match specific characters before and after the
formatted string to ensure that they only match at word boundaries.
However, because the capture groups consume each matching character,
including the characters used to determine word boundaries, two
formatted strings separated by a single matching character result in one
string not being matched, as the trailing whitespace is already
consumed and cannot also match as the leading pre-match whitespace for
the next character.
Switch to a look-ahead regex for the trailing word boundary
characters. This is zero-width, and as such the next match can still
also consume the same characters.
This also fixes Slack webhook integrations'
`test_message_with_complex_formatted_mentions` which was previously
expecting false output.
Fixes part of #30827.
The inner capture group of Slack text regex is used to capture the
formatted text, so basically all characters but the formatting
characters like *, ~, and _.
It currently does this by specifying a range of characters to be
captured except the formatting characters. This unintentionally excluded
non-ASCII characters like emoji.
This commit simplifies the inner capture groups of the Slack text regex
to explicitly exclude formatting characters (e.g., *, ~, _) instead of
using an allowlist to not include them.
This change is part of the effort to make `convert_to_zulip_markdown`
compatible with output from `render_blocks` and `render_attachments`,
which may contain emoji.
The inner capture groups of the Slack text regex tries to captures the
formatted string. For an unclear reason, we're currently using two
similar capture groups despite the fact that only using the second
capture groups would suffice.
This removes the first capture groups.
This prep commit modifies the Slack text regex in
`slack_message_conversion.py` to use the `re.MULTILINE` flag capture
formatted strings that are adjacent to newline or end of line.
These kinds of strings are likely not uncommon to be generated by Slack
exporter itself and our Slack message block and attachment formatters
(`render_blocks` and `render_attachments`) also produces them.
This also fixes Slack webhook integration's
`test_message_with_complex_formatted_texts` which was previously
expecting false output.
Fixes part of #30827.
In `convert_markdown_syntax`, `re.finditer` is used to iterate over each
`match` found in the `text`. In each iteration, we create the Zulip
formatting equivalent of the current `match` and do a `text.replace()`
to replace all strings in `text` that matches the current `match`.
Since we're planning to add the `re.MULTILINE` flag to `re.finditer`, it
would be problematic if the first and last capture group of `match` are
zero-width characters (newlines). This is because
`convert_markdown_syntax` would reformat all identical strike-through
and bold formatted strings as many time as there are identical `match`es
of them.
Consider the following:
---
original text: "*foo*\n*foo*"
1. <re.Match object; span=(0, 5), match='*foo*'>
1st replace -> "**foo**\n**foo**"
2. <re.Match object; span=(6, 11), match='*foo*'>
2nd replace -> "***foo***\n***foo***"
---
This commit uses `re.sub` instead of `re.finditer`, which does replace
operation only on the specific piece of the original text the `match` is
from.
This updates the patterns to use a more unicode-aware pattern for their
first and last capture groups. The new patterns have the same
behaviours, but they're expressed in a more coherent way.
For example, the existing patterns lists what characters to look for,
skipping ceratin characters it don't want to match (e.g, closing
brackets & quote are skipped). The new pattern narrows down what it
looks for (whitespace, punctuation, symbols) and explicitly list what it
don't want to match (closing quote and bracket, etc).
This also refactors the `convert_markdown_syntax` to use the `regex`
module instead of the `re` module because the `regex` module has full
unicode support.
Currently, threads in Slack direct messages will increment the
`thread_counter` variable inside the thread conversion logic. Since we
don't treat thread messages in Slack DMs differently than any other DM,
threads in DM will only falsely increment the thread topic names in
channels.
This adds a condition that checks if the Slack message is a DM or not
before executing the thread conversion logic.
Previously, `build_message` sets a message's topic name to the given
topic name, regardless of whether the message was a direct message (DM)
or a group direct message (GDM).
This change adds the `is_private` parameter to `build_message`. If
`is_private` is `True`, the `topic_name` will be overridden to an empty
string (""). Consequently, this also updates the third-party importers
to pass this parameter when calling `build_message`.
Co-authored-by: Pieter CK <pieterceka123@gmail.com>
The content `look` does not render as a
link, nor an image upload (were it to `/user_uploads/...`). The
`` syntax is intended for inline images, but unsupported in
Zulip, and as such does not link or render as _anything_.
Ensure a newline between message content and any attachments.
1. Fetching from the `/users.list` endpoint is supposed to use
pagination. Slack will return at most 1000 results in a single
request. This means that our Slack import system hasn't worked
properly for workspaces with more than 1000 users. Users after the
first 1000 would be considered by our tool as mirror dummies and thus
created with is_active=False,is_mirror_dummy=True.
Ref https://api.slack.com/methods/users.list
2. Workspaces with a lot of users, and therefore requiring the use of
paginated requests to fetch them all, might also get us to run into
Slack's rate limits, since we'll be doing repeating requests to the
endpoint.
Therefore, the API fetch needs to also handle rate limiting errors
correctly.
Per, https://api.slack.com/apis/rate-limits#headers, we can just read
the retry-after header from the rsponse and wait the indicated number
of seconds before repeating the requests. This is an easy approach to
implement, so that's what we go with here.
Previously, the `check_token_access` function had a hardcoded
`required_parameters` variable because it was only used in the Slack
data importer. This commit refactors `required_parameters` into a
function parameter, enabling the function to check a Slack token’s scope
for other purposes, such as Slack webhook integration.
Additionally, this commit changes the Slack API call in
`check_token_access` from `teams.info` to `api.test`. The endpoint is
better suited for this purpose since we're only checking a token’s scope
using the response header here.
Currently, we're unable to convert messages from Slack's integration
bots because this message subtype doesn't come from a Slack "user", that
is they don't have a Slack user profile.
This is a preparatory change to support converting Slack's integration
bot messages. This commit artificially creates Slack user data from the
integration bot's "profile" so that we can create a corresponding Zulip
user for them.
Part of #31311.
This prep commit extracts the logic for converting Slack formatted text
into Zulip friendly markdown from `convert_to_zulip_markdown` into a new
helper function, `convert_slack_formatting`.
This is done to make reformating logic be reusable else where such as in
the Slack webhook or Slack incoming webhook.
This prep commit extracts the logic for reformatting Slack mentions into
Zulip mentions from `convert_to_zulip_markdown` into a new helper
function, `convert_slack_workspace_mentions`.
This is done to make the reformatting logic be reusable else where such
as in the Slack webhook or Slack incoming webhook.
This prep commit moves `convert_to_zulip_markdown` and
`convert_markdown_syntax` so that they appear after the various
helper functions used in the main logic for converting Slack
messages to Zulip's formatting. This adjustment aligns with code
style conventions, improving readability.
The change also prepares the codebase for reusing Slack text
reformatting logic in the Slack incoming webhook.
This commit updates all third-party importer tools (Slack, Mattermost,
and Rocket Chat) in the `zerver/data_import` directory to also output a
migration_status.json file in their output tarball.
This is required because all importable tarball will be checked for
migration compatibility during import.
Fixes#28443.
For resizing the icon.png files, we use resize_avatar, not resize_logo.
This is pretty confusing - sure, for icons we use the same function as
for avatars, but we should have a proper name for the function called in
the icon context. So this commit also adds resize_realm_icon, and
changes the calls to resize_avatar in icon contexts to
resize_realm_icon.
This commit refactors `render_attachment` and `render_block`
out of slack_incoming.py to promote reusability.
The primary motivation for this refactor is to add support for
converting integration bots messages in Slack exports, which could use
the same functions.
Part of #31311.
This adds `normalize_body_for_import` to normalize messages from
third-party importers by removing NUL bytes and also updates import
test files data to test this.
Fixes#31930.