Commit Graph

362 Commits

Author SHA1 Message Date
Anders Kaseorg
19b8cde27f ruff: Fix PLC0206 Extracting value from dictionary without calling .items().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-21 21:06:53 -08:00
PieterCK
a746be807f slack_import: Make check_token_access more flexible.
Previously, the `check_token_access` function had a hardcoded
`required_parameters` variable because it was only used in the Slack
data importer. This commit refactors `required_parameters` into a
function parameter, enabling the function to check a Slack token’s scope
for other purposes, such as Slack webhook integration.

Additionally, this commit changes the Slack API call in
`check_token_access` from `teams.info` to `api.test`. The endpoint is
better suited for this purpose since we're only checking a token’s scope
using the response header here.
2024-12-18 16:11:31 -08:00
PieterCK
f988412394 slack_data_import: Support converting integration bot users.
Currently, we're unable to convert messages from Slack's integration
bots because this message subtype doesn't come from a Slack "user", that
is they don't have a Slack user profile.

This is a preparatory change to support converting Slack's integration
bot messages. This commit artificially creates Slack user data from the
integration bot's "profile" so that we can create a corresponding Zulip
user for them.

Part of #31311.
2024-12-16 13:09:57 -08:00
PieterCK
10946caa3d slack_data_import: Update how Slack user avatars are processed.
Previously, the Slack export converter can only process Slack's avatar
URL from Slack's "ca.slack-edge.com" server, which looks like this:

https://ca.slack-edge.com/T0CDRA6HM3P-U06NABE26M9-1173e04f818e-512

This commit adds support for converting any public downloadable image
URLs.

This is done to support importing Slack's integration bots and their
messages, which typically have PNG type file url:

https://avatars.slack-edge.com/2024-05-01/7057208497908_a4351f6deb91094eac4c_72.png
2024-12-16 13:09:57 -08:00
PieterCK
6543dadd0f slack_import: Refactor a convert_slack_formatting.
This prep commit extracts the logic for converting Slack formatted text
into Zulip friendly markdown from `convert_to_zulip_markdown` into a new
helper function, `convert_slack_formatting`.

This is done to make reformating logic be reusable else where such as in
the Slack webhook or Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
4b472611d0 slack_import: Refactor a convert_slack_workspace_mentions.
This prep commit extracts the logic for reformatting Slack mentions into
Zulip mentions from `convert_to_zulip_markdown` into a new helper
function, `convert_slack_workspace_mentions`.

This is done to make the reformatting logic be reusable else where such
as in the Slack webhook or Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
ff4d5daef4 slack_import: Reorder functions used in convert_to_zulip_markdown.
This prep commit moves `convert_to_zulip_markdown` and
`convert_markdown_syntax` so that they appear after the various
helper functions used in the main logic for converting Slack
messages to Zulip's formatting. This adjustment aligns with code
style conventions, improving readability.

The change also prepares the codebase for reusing Slack text
reformatting logic in the Slack incoming webhook.
2024-12-06 13:11:44 -08:00
PieterCK
0d7199b22e data_import: Add migration status file to converted exports.
This commit updates all third-party importer tools (Slack, Mattermost,
and Rocket Chat) in the `zerver/data_import` directory to also output a
migration_status.json file in their output tarball.

This is required because all importable tarball will be checked for
migration compatibility during import.

Fixes #28443.
2024-11-08 15:52:45 -08:00
Mateusz Mandera
420849ff6a slack: Call the correct resize_* function when importing realm icon.
For resizing the icon.png files, we use resize_avatar, not resize_logo.
This is pretty confusing - sure, for icons we use the same function as
for avatars, but we should have a proper name for the function called in
the icon context. So this commit also adds resize_realm_icon, and
changes the calls to resize_avatar in icon contexts to
resize_realm_icon.
2024-11-08 15:43:18 -08:00
Harsh
e468818d2b import: Remove skipping of too-long messages during import.
This commit eliminates the skipping of messages longer than 10K characters during the import process.
2024-11-07 16:04:14 -08:00
PieterCK
92437b4ab5 integrations: Refactor slack_incoming webhook.
This commit refactors `render_attachment` and `render_block`
out of slack_incoming.py to promote reusability.

The primary motivation for this refactor is to add support for
converting integration bots messages in Slack exports, which could use
the same functions.

Part of #31311.
2024-11-07 10:47:08 -08:00
Harsh
858fdeee39 import: Add function to normalize messages to import.
This adds `normalize_body_for_import` to normalize messages from
third-party importers by removing NUL bytes and also updates import
test files data to test this.

Fixes #31930.
2024-11-04 12:22:30 -08:00
Anders Kaseorg
42e1517255 email_notifications: Prevent html2text from mangling Unicode.
html2text mangles Unicode by default, with a --unicode-snob option to
disable it.  If I have to get called a “snob” for wanting to correctly
support non-English languages, then uh, I’ll take one for the team.

https://github.com/Alir3z4/html2text/blob/2024.2.26/html2text/config.py#L111-L150

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-10-30 09:49:23 -07:00
PieterCK
fc50736f4e slack_data_import: Fix incorrect hyperlink conversion.
Currently, Slack messages containing hyperlinks
(e.g.,<http://foo.com|Foo!>) are converted like
normal links. This commit reformats Slack
hyperlinks into Zulip-friendly markdown
(e.g., [Foo!](http://foo.com)).

Part of #32165.
2024-10-30 09:48:32 -07:00
Tim Abbott
010410c849 rocketchat: Validate custom emoji before larger data sets.
This is a data set that's relatively likely to have weird failures,
and also likely to be fairly small.
2024-10-17 12:25:18 -07:00
Tim Abbott
6e4da50577 rocketchat: Complete metadata verification before importing uploads.
This is not the best factored version of this, but it saves effort
changing the tests, and importantly should make failures involving
metadata only take a couple seconds rather than first doing a giant
BSON read before learning about them.
2024-10-17 12:25:18 -07:00
Tim Abbott
79b6f43d0e rocketchat: Move bson_code_options to a global variable.
This will make it a lot easier to only read files in when we actually
need them.
2024-10-17 12:25:18 -07:00
PieterCK
6289a551aa data_import: Add email validation to third-party data converters.
This commit makes the third-party data converters check for invalid user
emails. If it finds any, it’ll raise an Exception and show an error
message with all the bad emails listed out.

Fixes: #31783
2024-10-15 16:04:43 -07:00
Alex Vandiver
2c51824b7d slack_import: Strip port from "domain_name".
This lets slack conversions be done on development hosts, which have a
trailing :9991 on their EXTERNAL_HOST; otherwise, we generate fake
emails like `imported-slack-bot@host.name:9991` which fail to
validate.
2024-09-26 12:01:11 -07:00
Alex Vandiver
e68096c907 slack: Protect against zip bombs.
A file which unpacks to more than 10x its original size is suspect,
particularly if that results in an uncompressed size > 1GB.
2024-09-26 12:01:11 -07:00
Alex Vandiver
6f7c14c9ec slack: Check that the archive is shaped the way we expect.
This is some minor protection against malicious zipfiles (e.g. many
very deep directories to chew up inodes), in addition to validation.
2024-09-26 12:01:11 -07:00
Alex Vandiver
d9f868a163 slack: Clean up expanded zipfiles more consistently. 2024-09-26 12:01:11 -07:00
Alex Vandiver
c8c480baef mattermost: Handle duplicate dm-groups.
Observed in the wild, cause unknown.

Partially fixes: #24131.

Co-authored-by: Mateusz Mandera <mateusz.mandera@zulip.com>
2024-09-09 12:05:28 -07:00
roanster007
c6a06d4684 direct_message_group: Add new group_size field.
This commit adds a new `group_size` field to the `DirectMessageGroup`
model, and backfills its value to each of the existing direct message
groups.

Fixes part of #25713
2024-08-23 11:09:41 -07:00
Mateusz Mandera
9cad9644e7 import_util: Fix file extensions of emoji files downloaded from Slack.
The Slack API when returning the emoji records, returns the record for
its thumbsup_all emoji with the url ending with .png, even though the
file is a gif.

For that reason, we have to make that code correct file extensions based
on the response content-type. Emojis are the smallest set of images to
download, so for simplicity of implementation, we remove the
parallelization of the downloads in favor of just processing them
serially.
2024-08-21 16:30:19 -07:00
Prakhar Pratyush
19d56f77b5 rocketchat: Fix "OSError: [Errno 36] File name too long" error.
Earlier, we were replacing too long attachment name with random uuid
when the character count of the file name was greater than 255.

This results in "OSError: [Errno 36] File name too long" error in
few cases when the file name has less than 255 characters but more
than 255 bytes (file name with Non-ASCII characters).

This commit updates the code to check the file name's byte size
instead of characters count.
2024-08-14 18:18:31 -07:00
roanster007
7b3e163d55 refactor: Rename huddle to direct_message_group in non api files.
This commit completes rename of "huddle" to "direct_message_group"
in all the non API files.

Part of #28640
2024-07-31 23:25:56 -07:00
Anders Kaseorg
722842a0aa rocketchat: Remove unnecessary SHA-1 hashing of direct message groups.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg
541699a1c6 mattermost: Remove unnecessary MD5 hashing of direct message groups.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg
90cc88c88e mattermost: Split get_recipient_id_from_receiver_name to 3 functions.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg
27b0618704 data_import: Fix IdMapper typing.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg
1fd3f983a5 data_import: Remove int detection from IdMapper.
This seems to have been used only for HipChat.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-17 15:56:00 -07:00
Anders Kaseorg
e3a191b99b ruff: Fix FURB154 Use of repeated consecutive global, nonlocal.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:53:18 -07:00
Anders Kaseorg
6412c2d630 ruff: Fix FURB142 Use of set.add() in a for loop.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:52:59 -07:00
Anders Kaseorg
b96feb34f6 ruff: Fix SIM117 Use a single with statement with multiple contexts.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:48:32 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver
0442e95276 emoji: Use a non-predictable filename.
We use a truncated SHA256 of the id and a server-side secret to make
emoji have non-guessable filenames, while also making collisions
unlikely.

We also adjust the Slack import to use the same SHA-based name,
instead of taking the same name as it had in Slack.
2024-07-12 13:26:47 -07:00
Alex Vandiver
5ae34dc42b slack: Store the content-type of realm icons. 2024-07-11 07:31:39 -07:00
Alex Vandiver
b050c644e3 data_import: Mark downloaded png avatars as image/png.
This means that, when imported, they have the right content-type set
in S3.
2024-07-11 07:31:39 -07:00
Alex Vandiver
51766271a1 data_import: Remove a reference to removed code.
The only use of the code was removed in c685d36821, and the code
itself was removed in 4206e5f00b.
2024-07-11 07:31:39 -07:00
roanster007
02d0566dc5 refactor: Rename Huddle Django model class to DirectMessageGroup.
This commit renames the "Huddle" Django model class to
"DirectMessageGroup", while maintaining the same table --
"zerver_huddle".

Fixes part of #28640.
2024-07-07 21:31:30 -07:00
Alex Vandiver
e29a455b2d avatars: Encode version into the filename.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable.  Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.

This requires plumbing the current (or future) avatar version through
various parts of the upload process.

Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).

We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.

Fixes: #12852.
2024-07-07 14:40:07 -07:00
roanster007
52692a6448 refactor: Rename huddle to direct_message_group in non API.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.

It also renames variables and methods of type -
"huddle_message" to "group_direct_message".

This is a part of #28640
2024-07-04 07:56:31 -07:00
Alex Vandiver
17fb23746f upload: Move methods into zerver.lib.upload from .base. 2024-06-26 16:43:11 -07:00
Alex Vandiver
0153d6dbcd thumbnailing: Move resizing functions into zerver.lib.thumbnail. 2024-06-20 23:06:08 -04:00
Anders Kaseorg
b545abe1e2 typos: Fix typos caught by mwic.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-05-20 13:55:00 -07:00
Lauryn Menard
b714bd9eec help: Rename and redirect set-default-streams-for-new-users for channel. 2024-05-03 13:02:20 -07:00
Lauryn Menard
c931966e1b help: Rename and redirect stream-sending-policy for channel. 2024-05-03 13:02:20 -07:00