36 Commits

Author SHA1 Message Date
Mateusz Mandera
e65fb2d051 slack: Requote image_url in render_attachment.
Slack attachment urls with white spaces,
e.g. `https://example.com/some file.png`,
were rejected by `check_url`. We want to call `requote_url` to deal with
any url-quoting jankiness that may be present in the exported data.
2025-10-10 09:52:50 -07:00
Mateusz Mandera
d484fd95d8 slack: Add missing block types.
1. `call` is a block type we've observed in the wild in a Slack export,
   despite not being documented in
   https://docs.slack.dev/reference/block-kit/blocks/

2. We already have the logic for converting `condition` block type below
   in the function, but it was erroneously missing from the list of valid
   types.
2025-10-10 09:52:44 -07:00
Mateusz Mandera
df7bb3cdd4 slack: Handle float "ts" in attachment info.
It's rare, but sometimes we see floats in this field.
2025-07-21 09:24:03 -07:00
PieterCK
580845415b slack_data_import: Refactor get_user_mentions.
This commit extracts a portion of logic from `get_user_mentions` into a
new helper function for reusability. This is a prep commit for #30166.
2025-07-15 15:49:05 -07:00
PieterCK
0dfb709152 slack_data_import: Support converting integration bot messages.
Integration bot messages in Slack may include "blocks" and
"attachments," which are Slack's messaging features.
Currently, these messages aren't processed when converting
Slack export data.

This commit adds support for converting integration bot
messages, as well as other Slack messages containing "blocks"
and "attachments".

Message payload with the block type `rich_text` is skipped because all
messages sent by users have this format.

Fixes #31162.

[1]=https://docs.slack.dev/reference/block-kit/blocks/rich-text-block/
2025-05-23 14:27:31 -07:00
PieterCK
a228699108 slack_incoming: Use Slack text reformatters from data import.
This commit updates the Slack incoming webhook endpoint to use the same
Slack reformatting functions (`convert_to_zulip_markdown` and
`replace_links`) that are used for Slack data import and cleaning up any
duplicative functions.

This was previously not possible because the Slack reformatting regex in
`slack_message_conversion.py` could not handle these cases:

- Formatting applied to non-ASCII characters (e.g., emoji).

- Formatted strings separated by exactly one character.

- Formatted strings appearing immediately after a new line.

Fixes part of #31162.
2025-04-28 20:49:10 -07:00
PieterCK
c018911c5b slack_regex: Fix overlapping capture groups.
The Slack text regexes match specific characters before and after the
formatted string to ensure that they only match at word boundaries.

However, because the capture groups consume each matching character,
including the characters used to determine word boundaries, two
formatted strings separated by a single matching character result in one
string not being matched, as the trailing whitespace is already
consumed and cannot also match as the leading pre-match whitespace for
the next character.

Switch to a look-ahead regex for the trailing word boundary
characters. This is zero-width, and as such the next match can still
also consume the same characters.

This also fixes Slack webhook integrations'
`test_message_with_complex_formatted_mentions` which was previously
expecting false output.

Fixes part of #30827.
2025-04-28 20:49:10 -07:00
PieterCK
cfeb4ba731 slack_regex: Simplify Slack regex main capture group.
The inner capture group of Slack text regex is used to capture the
formatted text, so basically all characters but the formatting
characters like *, ~, and _.

It currently does this by specifying a range of characters to be
captured except the formatting characters. This unintentionally excluded
non-ASCII characters like emoji.

This commit simplifies the inner capture groups of the Slack text regex
to explicitly exclude formatting characters (e.g., *, ~, _) instead of
using an allowlist to not include them.

This change is part of the effort to make `convert_to_zulip_markdown`
compatible with output from `render_blocks` and `render_attachments`,
which may contain emoji.
2025-04-28 20:49:10 -07:00
PieterCK
ee34ccb8c2 slack_regex: Remove unnecessary inner capture group.
The inner capture groups of the Slack text regex tries to captures the
formatted string. For an unclear reason, we're currently using two
similar capture groups despite the fact that only using the second
capture groups would suffice.

This removes the first capture groups.
2025-04-28 20:49:10 -07:00
PieterCK
8992435caf slack_regex: Update Slack regex handle multiline strings.
This prep commit modifies the Slack text regex in
`slack_message_conversion.py` to use the `re.MULTILINE` flag capture
formatted strings that are adjacent to newline or end of line.

These kinds of strings are likely not uncommon to be generated by Slack
exporter itself and our Slack message block and attachment formatters
(`render_blocks` and `render_attachments`) also produces them.

This also fixes Slack webhook integration's
`test_message_with_complex_formatted_texts` which was previously
expecting false output.

Fixes part of #30827.
2025-04-28 20:49:10 -07:00
PieterCK
6c1818fa46 slack_regex: Avoid replacing all identical matches.
In `convert_markdown_syntax`, `re.finditer` is used to iterate over each
`match` found in the `text`. In each iteration, we create the Zulip
formatting equivalent of the current `match` and do a `text.replace()`
to replace all strings in `text` that matches the current `match`.

Since we're planning to add the `re.MULTILINE` flag to `re.finditer`, it
would be problematic if the first and last capture group of `match` are
zero-width characters (newlines). This is because
`convert_markdown_syntax` would reformat all identical strike-through
and bold formatted strings as many time as there are identical `match`es
of them.

Consider the following:
---
 original text: "*foo*\n*foo*"

 1. <re.Match object; span=(0, 5), match='*foo*'>
    1st replace -> "**foo**\n**foo**"

 2. <re.Match object; span=(6, 11), match='*foo*'>
    2nd replace -> "***foo***\n***foo***"
---

This commit uses `re.sub` instead of `re.finditer`, which does replace
operation only on the specific piece of the original text the `match` is
from.
2025-04-28 20:49:10 -07:00
PieterCK
946e4096fc slack_regex: Refactor how the capture groups are defined.
This updates the patterns to use a more unicode-aware pattern for their
first and last capture groups. The new patterns have the same
behaviours, but they're expressed in a more coherent way.

For example, the existing patterns lists what characters to look for,
skipping ceratin characters it don't want to match (e.g, closing
brackets & quote are skipped). The new pattern narrows down what it
looks for (whitespace, punctuation, symbols) and explicitly list what it
don't want to match (closing quote and bracket, etc).

This also refactors the `convert_markdown_syntax` to use the `regex`
module instead of the `re` module because the `regex` module has full
unicode support.
2025-04-28 20:49:10 -07:00
PieterCK
f33c5b9c1e slack_regex: Fix wrong comments. 2025-04-28 20:27:28 -07:00
PieterCK
6543dadd0f slack_import: Refactor a convert_slack_formatting.
This prep commit extracts the logic for converting Slack formatted text
into Zulip friendly markdown from `convert_to_zulip_markdown` into a new
helper function, `convert_slack_formatting`.

This is done to make reformating logic be reusable else where such as in
the Slack webhook or Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
4b472611d0 slack_import: Refactor a convert_slack_workspace_mentions.
This prep commit extracts the logic for reformatting Slack mentions into
Zulip mentions from `convert_to_zulip_markdown` into a new helper
function, `convert_slack_workspace_mentions`.

This is done to make the reformatting logic be reusable else where such
as in the Slack webhook or Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
ff4d5daef4 slack_import: Reorder functions used in convert_to_zulip_markdown.
This prep commit moves `convert_to_zulip_markdown` and
`convert_markdown_syntax` so that they appear after the various
helper functions used in the main logic for converting Slack
messages to Zulip's formatting. This adjustment aligns with code
style conventions, improving readability.

The change also prepares the codebase for reusing Slack text
reformatting logic in the Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
92437b4ab5 integrations: Refactor slack_incoming webhook.
This commit refactors `render_attachment` and `render_block`
out of slack_incoming.py to promote reusability.

The primary motivation for this refactor is to add support for
converting integration bots messages in Slack exports, which could use
the same functions.

Part of #31311.
2024-11-07 10:47:08 -08:00
PieterCK
fc50736f4e slack_data_import: Fix incorrect hyperlink conversion.
Currently, Slack messages containing hyperlinks
(e.g.,<http://foo.com|Foo!>) are converted like
normal links. This commit reformats Slack
hyperlinks into Zulip-friendly markdown
(e.g., [Foo!](http://foo.com)).

Part of #32165.
2024-10-30 09:48:32 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
c2c96eb0cf python: Annotate type aliases with TypeAlias.
This is not strictly necessary but it’s clearer and improves mypy’s
error messages.

https://docs.python.org/3/library/typing.html#typing.TypeAlias
https://mypy.readthedocs.io/en/stable/kinds_of_types.html#type-aliases

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-07 10:02:49 -07:00
Anders Kaseorg
df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Anders Kaseorg
e5d671bf2b ruff: Fix SIM210 Use bool(…) instead of True if … else False.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Anders Kaseorg
544bbd5398 docs: Fix capitalization mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-10 09:57:26 -07:00
Anders Kaseorg
6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg
768f9f93cd docs: Capitalize Markdown consistently.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-11 10:23:06 -07:00
Anders Kaseorg
365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg
67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
Vishnu KS
9d51a1b527 slack: Rename added_users to slack_user_id_to_zulip_user_id. 2019-08-26 14:10:19 -07:00
Vishnu Ks
1e5c49ad82 slack: Support importing shared channels. 2019-08-26 14:10:19 -07:00
Anders Kaseorg
e417d3a040 slack_message_conversion: Clean up type ignores.
Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2019-08-09 16:39:16 -07:00
Pragati Agrawal
e1772b3b8f tools: Upgrade Pycodestyle and fix new linter errors.
Here, we are upgrading pycodestyle version from 2.4.0 to 2.5.0.

Fixes: #11396.
2019-01-31 12:21:41 -08:00
Rhea Parekh
b8e1e8b31d import: Add slack import files in zerver/data_import directory. 2018-08-01 11:52:14 -07:00