Slack attachment urls with white spaces,
e.g. `https://example.com/some file.png`,
were rejected by `check_url`. We want to call `requote_url` to deal with
any url-quoting jankiness that may be present in the exported data.
1. `call` is a block type we've observed in the wild in a Slack export,
despite not being documented in
https://docs.slack.dev/reference/block-kit/blocks/
2. We already have the logic for converting `condition` block type below
in the function, but it was erroneously missing from the list of valid
types.
Integration bot messages in Slack may include "blocks" and
"attachments," which are Slack's messaging features.
Currently, these messages aren't processed when converting
Slack export data.
This commit adds support for converting integration bot
messages, as well as other Slack messages containing "blocks"
and "attachments".
Message payload with the block type `rich_text` is skipped because all
messages sent by users have this format.
Fixes#31162.
[1]=https://docs.slack.dev/reference/block-kit/blocks/rich-text-block/
This commit updates the Slack incoming webhook endpoint to use the same
Slack reformatting functions (`convert_to_zulip_markdown` and
`replace_links`) that are used for Slack data import and cleaning up any
duplicative functions.
This was previously not possible because the Slack reformatting regex in
`slack_message_conversion.py` could not handle these cases:
- Formatting applied to non-ASCII characters (e.g., emoji).
- Formatted strings separated by exactly one character.
- Formatted strings appearing immediately after a new line.
Fixes part of #31162.
The Slack text regexes match specific characters before and after the
formatted string to ensure that they only match at word boundaries.
However, because the capture groups consume each matching character,
including the characters used to determine word boundaries, two
formatted strings separated by a single matching character result in one
string not being matched, as the trailing whitespace is already
consumed and cannot also match as the leading pre-match whitespace for
the next character.
Switch to a look-ahead regex for the trailing word boundary
characters. This is zero-width, and as such the next match can still
also consume the same characters.
This also fixes Slack webhook integrations'
`test_message_with_complex_formatted_mentions` which was previously
expecting false output.
Fixes part of #30827.
The inner capture group of Slack text regex is used to capture the
formatted text, so basically all characters but the formatting
characters like *, ~, and _.
It currently does this by specifying a range of characters to be
captured except the formatting characters. This unintentionally excluded
non-ASCII characters like emoji.
This commit simplifies the inner capture groups of the Slack text regex
to explicitly exclude formatting characters (e.g., *, ~, _) instead of
using an allowlist to not include them.
This change is part of the effort to make `convert_to_zulip_markdown`
compatible with output from `render_blocks` and `render_attachments`,
which may contain emoji.
The inner capture groups of the Slack text regex tries to captures the
formatted string. For an unclear reason, we're currently using two
similar capture groups despite the fact that only using the second
capture groups would suffice.
This removes the first capture groups.
This prep commit modifies the Slack text regex in
`slack_message_conversion.py` to use the `re.MULTILINE` flag capture
formatted strings that are adjacent to newline or end of line.
These kinds of strings are likely not uncommon to be generated by Slack
exporter itself and our Slack message block and attachment formatters
(`render_blocks` and `render_attachments`) also produces them.
This also fixes Slack webhook integration's
`test_message_with_complex_formatted_texts` which was previously
expecting false output.
Fixes part of #30827.
In `convert_markdown_syntax`, `re.finditer` is used to iterate over each
`match` found in the `text`. In each iteration, we create the Zulip
formatting equivalent of the current `match` and do a `text.replace()`
to replace all strings in `text` that matches the current `match`.
Since we're planning to add the `re.MULTILINE` flag to `re.finditer`, it
would be problematic if the first and last capture group of `match` are
zero-width characters (newlines). This is because
`convert_markdown_syntax` would reformat all identical strike-through
and bold formatted strings as many time as there are identical `match`es
of them.
Consider the following:
---
original text: "*foo*\n*foo*"
1. <re.Match object; span=(0, 5), match='*foo*'>
1st replace -> "**foo**\n**foo**"
2. <re.Match object; span=(6, 11), match='*foo*'>
2nd replace -> "***foo***\n***foo***"
---
This commit uses `re.sub` instead of `re.finditer`, which does replace
operation only on the specific piece of the original text the `match` is
from.
This updates the patterns to use a more unicode-aware pattern for their
first and last capture groups. The new patterns have the same
behaviours, but they're expressed in a more coherent way.
For example, the existing patterns lists what characters to look for,
skipping ceratin characters it don't want to match (e.g, closing
brackets & quote are skipped). The new pattern narrows down what it
looks for (whitespace, punctuation, symbols) and explicitly list what it
don't want to match (closing quote and bracket, etc).
This also refactors the `convert_markdown_syntax` to use the `regex`
module instead of the `re` module because the `regex` module has full
unicode support.
This prep commit extracts the logic for converting Slack formatted text
into Zulip friendly markdown from `convert_to_zulip_markdown` into a new
helper function, `convert_slack_formatting`.
This is done to make reformating logic be reusable else where such as in
the Slack webhook or Slack incoming webhook.
This prep commit extracts the logic for reformatting Slack mentions into
Zulip mentions from `convert_to_zulip_markdown` into a new helper
function, `convert_slack_workspace_mentions`.
This is done to make the reformatting logic be reusable else where such
as in the Slack webhook or Slack incoming webhook.
This prep commit moves `convert_to_zulip_markdown` and
`convert_markdown_syntax` so that they appear after the various
helper functions used in the main logic for converting Slack
messages to Zulip's formatting. This adjustment aligns with code
style conventions, improving readability.
The change also prepares the codebase for reusing Slack text
reformatting logic in the Slack incoming webhook.
This commit refactors `render_attachment` and `render_block`
out of slack_incoming.py to promote reusability.
The primary motivation for this refactor is to add support for
converting integration bots messages in Slack exports, which could use
the same functions.
Part of #31311.
Currently, Slack messages containing hyperlinks
(e.g.,<http://foo.com|Foo!>) are converted like
normal links. This commit reformats Slack
hyperlinks into Zulip-friendly markdown
(e.g., [Foo!](http://foo.com)).
Part of #32165.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.
(This does not actually upgrade our Python environment to Black 23
yet.)
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Fixes#2665.
Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.
Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start. I expect this change will increase pressure for us to split
those files, which isn't a bad thing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>