135 Commits

Author SHA1 Message Date
Anders Kaseorg
41a6511924 typos: Fix typos caught by typos.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit 4cb2eded68)
2023-11-15 10:13:24 -08:00
Anders Kaseorg
8fdcadb08a ruff: Fix UP032 Use f-string instead of format call.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit 50e6cba1af)
2023-08-10 17:01:52 -05:00
Alex Vandiver
da4c4f74f2 slack: Handle the special case of permissions denied on team.info call.
This is a follow-up to 4c8915c8e4, for
the case when the `team:read` permission is missing, which causes the
`team.info` call itself to fail.  The error message supplies
information about the provided and missing permissions -- but it also
still sends the `X-OAuth-Scopes` header which we normall read, so we can
use that as normal.

(cherry picked from commit 21aeb4a040)
2023-07-03 18:51:15 +00:00
Alex Vandiver
8878fee6d6 slack: Provide more information when a Slack token fails to validate.
(cherry picked from commit 4c8915c8e4)
2023-07-03 18:48:08 +00:00
rht
1c84f02f57 slack import: Convert threads to nicely named Zulip topics.
Fixes #9006.
2023-05-30 16:35:19 -07:00
Anders Kaseorg
9797de52a0 ruff: Fix RUF010 Use conversion in f-string.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-05-26 22:09:18 -07:00
Alex Vandiver
567d1d54e7 upload: Rename upload_message_file to use word "attachment".
For consistency with the table, which is named Attachment.
2023-03-02 16:36:19 -08:00
Alex Vandiver
fe654b76b7 data_import: Stop tar'ing up converted data.
`./manage.py import` does not take a tarball; it takes a directory.
Making a separate tarball is a waste of CPU time and disk, as it is
never used.

This was included in the commit of the initial Slack conversion code
in 5b37c5562b and propagated from there into every conversion tool.

Remove the unnecessary tarball creation.
2023-02-26 17:42:01 -08:00
Alex Vandiver
92c8c17190 import: Add the UTF-8 flag on file entries in zipfiles from Slack.
Fixes: #22533.
2023-01-31 16:07:48 -08:00
Anders Kaseorg
e5d671bf2b ruff: Fix SIM210 Use bool(…) instead of True if … else False.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-23 11:18:36 -08:00
Anders Kaseorg
8f7a7877fe python: Clean up janky URL matching code with urlsplit.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-18 17:25:46 -05:00
Alex Vandiver
7c0d414aff uploads: Split out S3 and local file backends into separate files.
The uploads file is large, and conceptually the S3 and local-file
backends are separable.
2023-01-09 18:23:58 -05:00
Anders Kaseorg
e1ed44907b ruff: Fix SIM118 Use key in dict instead of key in dict.keys().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Anders Kaseorg
872f4b41c1 ci: Check that non-scripts aren’t marked executable.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-12-07 09:54:01 -08:00
M@
280523ad48 slack import: Merge and dedupe same-base emoji reactions and userlists.
The naive solution #23465 creates situations where the same user can have
multiple reactions as the base emojis are not unique, e.g. +1::skin2
and +1::skin4 would both reduce to +1 but the userlists are separate.
This solution handles the reduction, merges the same-base reactions,
and deduplicates the userlist.

Co-authored-by: Alex Vandiver <alexmv@zulip.com>
Co-authored-by: rht <rhtbot@protonmail.com>
2022-11-16 11:11:43 -08:00
rht
5ce2103b87 Slack import: Cache emoji.json in static/generated/emoji.
Previously, emoji.json was read from
"$ZULIP_PATH/node_modules/emoji-datasource-google/emoji.json".
This path doesn't exist in production when installing from scratch from
a release tarball. And so, we ensure emoji.json exists by copying it to
`static/generated/emoji`.

With tweaks to comments by tabbott.

Fixes: #23469
2022-11-15 10:43:11 -08:00
Matt Keller
958b58f174 slack: Skip reactions for deleted users.
Fixes #23552.
2022-11-14 13:08:15 -08:00
Matt Keller
8aa7ff4bbb slack: Parse emoji skin tone variants.
Fixes part of #23276.
2022-11-07 14:25:49 -08:00
Matt Keller
4d87bf291c slack: Skip files where file_access: file_not_found. 2022-10-25 12:18:20 -07:00
Matt Keller
c5f106ce1b slack: Skip files where file_access: access_denied.
These stubs are incomplete and should be treated akin to tombstones.
2022-10-11 10:53:16 -07:00
Mateusz Mandera
00b3546c9f models: Add denormalized .realm column to Message.
This commit adds the OPTIONAL .realm attribute to Message
(and ArchivedMessage), with the server changes for making new Messages
have this set. Old Messages still have to be migrated to backfill this,
before it can be non-nullable.

Appropriate test changes to correctly set .realm for Messages the tests
manually create are included here as well.
2022-10-07 10:09:38 -07:00
Mateusz Mandera
2811a1228f import_util: Make build_message only take kwargs.
build_message has a lot of arguments, so it's hard to verify correctness
of callers that just try to get the order right. It's much clearer to be
explicit via kwargs. mattermost.py and rocketchat.py already do this, so
let's bring slack.py and gitter.py up to par.
2022-09-27 15:04:48 -07:00
Matt Keller
fd996c286e slack: Filter out non-.json files for processing. 2022-09-23 09:59:34 -07:00
rht
a7cff0f091 Slack import: Translate to emoji name to codepoint using iamcal data.
Because Slack emoji naming is different from Zulip's.
According to https://emojipedia.org/slack/, Slack's emoji shortcodes are
derived from https://github.com/iamcal/emoji-data.
There are probably some deviations from that dataset, but this PR should
at least catch the ones that are identical to iamcal's.
2022-09-17 12:04:07 -07:00
Mateusz Mandera
eed8800573 long_term_idle_helper: Change all_user_ids arg to an Iterator. 2022-08-29 11:03:27 -07:00
Mateusz Mandera
75f26bb8ff long_term_idle_helper: Take list of user_ids as arg instead of dicts.
Only ["id"] is accessed on the dicts (representing the external tool
users). Given that for some tools the id may be under a different name
etc. due to different user dicts format, it's best to just pass those
ids to the function so that it can stay generalized and not reliant
on a specific user dict format.
2022-08-29 11:03:27 -07:00
Mateusz Mandera
c4c270380a slack: Use get_timestamp_from_message helper function where relevant.
get_timestamp_from_message was extracted in the previous commit. We can
deduplicate and the code a bit cleaner by using it where appropriate
instead of message["ts"].
2022-08-29 11:03:27 -07:00
Mateusz Mandera
9e56e71afe long_term_idle_helper: Take timestamp_from_message callable arg.
message["ts"] is slack-specific. For this to be a general util function
it needs to take a callable that will grab a timestamp from the message
dict (which has varying formats depending on what we're importing from).
2022-08-29 11:03:27 -07:00
Alex Vandiver
1b1faa3907 import_util: Factor out long_term_idle_helper. 2022-08-29 11:03:27 -07:00
Anders Kaseorg
b945aa3443 python: Use a real parser for email addresses.
Now that we can assume Python 3.6+, we can use the
email.headerregistry module to replace hacky manual email address
parsing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-29 15:47:33 -07:00
Alex Vandiver
2e50ead9d1 data_import: Fix bot email address de-duplication.
4815f6e28b tried to de-duplicate bot
email addresses, but instead caused duplicates to crash:

```
Traceback (most recent call last):
  File "./manage.py", line 157, in <module>
    execute_from_command_line(sys.argv)
  File "./manage.py", line 122, in execute_from_command_line
    utility.execute()
  File "/srv/zulip-venv-cache/56ac6adf406011a100282dd526d03537be84d23e/zulip-py3-venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/srv/zulip-venv-cache/56ac6adf406011a100282dd526d03537be84d23e/zulip-py3-venv/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/srv/zulip-venv-cache/56ac6adf406011a100282dd526d03537be84d23e/zulip-py3-venv/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/management/commands/convert_slack_data.py", line 59, in handle
    do_convert_data(path, output_dir, token, threads=num_threads)
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/data_import/slack.py", line 1320, in do_convert_data
    ) = slack_workspace_to_realm(
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/data_import/slack.py", line 141, in slack_workspace_to_realm
    ) = users_to_zerver_userprofile(slack_data_dir, user_list, realm_id, int(NOW), domain_name)
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/data_import/slack.py", line 248, in users_to_zerver_userprofile
    email = get_user_email(user, domain_name)
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/data_import/slack.py", line 406, in get_user_email
    return SlackBotEmail.get_email(user["profile"], domain_name)
  File "/home/zulip/deployments/2022-03-16-22-25-42/zerver/data_import/slack.py", line 85, in get_email
    email_prefix += cls.duplicate_email_count[email]
TypeError: can only concatenate str (not "int") to str
```

Fix the stringification, make it case-insensitive, append with a dash
for readability, and add tests for all of the above.
2022-03-31 11:10:18 -07:00
Anders Kaseorg
b0ce4f1bce docs: Fix many spelling mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-07 18:51:06 -08:00
rht
58b19761b8 slack import: Fix requests.get usage of get_slack_api_data.
We also rewrite the tests using the `responses` module to avoid the
problematic mocking that made this bug possible.

Fixes #19833.
2021-10-07 11:46:23 -07:00
rht
d8e1409fe5 Slack import: Use Python ZipFile to unzip.
This should handle the case when non-ASCII Unicode folder names are
created on Windows.

Fixes #19899.
2021-10-07 09:24:19 -07:00
Priyansh Garg
4815f6e28b data_import: Make slack bot emails unique.
Slack bot emails generated by us can be duplicate for two bots.
If such a case occur, append a counter to the email to make it
unique.

For maintaining the counter of duplicate emails and the final
email assigned to each bot, a class based approach is used with
static variables and static (class) methods. This keeps all the
data related to slack bot emails at the same place and easily
accessible from anywhere inside the module (without defining any
class object and passing it around).

Fixes: #16793
2021-08-03 16:18:14 -07:00
Anders Kaseorg
5483ebae37 python: Convert "".format to Python 3.6 f-strings.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Anders Kaseorg
3665deb93a python: Remove unnecessary intermediate lists.
Generated automatically by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
rht
1bbd36d181 slack_import: Remove obsolete SlackImportAttachment placeholder.
This was introduced in f4ad464d82, and
incompletely removed in e037c2f93e649c28a71c02559b5ae7a3333f42a8; here
we finish removing it.
2021-08-02 13:13:28 -07:00
Alex Vandiver
ff9126ac1e data_import: Protect better against bad Slack tokens.
An invalid token would be treated the same as a token with no scopes;
differentiate these better.
2021-05-27 22:46:58 -07:00
Alex Vandiver
94e4f33b29 data_import: Support importing from Slack conversions in a directory.
Sometimes the Slack import zip file we get isn't quite the canonical
form that Slack produces -- often because the user has unzip'd it,
looked at it, and re-zip'd it, resulting in extra nested directories
and the like.

For such cases, support passing in a path to an unpacked Slack export
tree.
2021-05-27 22:46:58 -07:00
Alex Vandiver
8228ea2a17 import_data: Do some quick verification of Slack import formats. 2021-05-27 22:46:58 -07:00
Anders Kaseorg
544bbd5398 docs: Fix capitalization mistakes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-10 09:57:26 -07:00
Cyril Pletinckx
ba7da6d5c0 import/export: Fix deprecated authentication method for Slack.
The query string parameter authentication method is now deprecated for
newly created Slack applications since the 24th of February[1].  This
causes Slack imports to fail, claiming that the token has none of the
required scopes.

Two methods can be used to solve this problem: either include the
authentication token in the header of an HTTP GET request, or include
it in the body of an HTTP POST request. The former is preferred, as
the code was already written to use HTTP GET requests.

Change the way the parameters are passed to the "requests.get" method
calls, to pass the token via the `Authorization` header.

[1] https://api.slack.com/changelog/2020-11-no-more-tokens-in-querystrings-for-newly-created-apps

Fixes: #17408.
2021-03-08 12:56:37 -08:00
Anders Kaseorg
6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Alex Vandiver
7c849fa940 slack: Check token access scopes before importing.
The Slack API always (even for failed requests) puts the access scopes
of the token passed in, into "X-OAuth-Scopes"[1], which can be used to
determine if any are missing -- and if so, which.

[1] https://api.slack.com/legacy/oauth-scopes#working-with-scopes
2020-12-15 11:33:15 -08:00
Tim Abbott
067cd3a97a docs: Remove incorrect references to chat.zulip.org.
Most of these are Help Center links that should be pointing to the
production Help Center.
2020-10-29 16:46:40 -07:00
Anders Kaseorg
4e9d587535 python: Pass query parameters as a dict when making GET requests.
This provides automatic URL-encoding.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-27 13:47:02 -07:00
Anders Kaseorg
72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg
b7b7475672 python: Use standard secrets module to generate random tokens.
There are three functional side effects:

• Correct an insignificant but mathematically offensive bias toward
repeated characters in generate_api_key introduced in commit
47b4283c4b4c70ecde4d3c8de871c90ee2506d87; its entropy is increased
from 190.52864 bits to 190.53428 bits.

• Use the base32 alphabet in confirmation.models.generate_key; its
entropy is reduced from 124.07820 bits to the documented 120 bits, but
now it uses 1 syscall instead of 24.

• Use the base32 alphabet in get_bigbluebutton_url; its entropy is
reduced from 51.69925 bits to 50 bits, but now it uses 1 syscall
instead of 10.

(The base32 alphabet is A-Z 2-7.  We could probably replace all of
these with plain secrets.token_urlsafe, since I expect most callers
can handle the full urlsafe_b64 alphabet A-Z a-z 0-9 - _ without
problems.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-09 15:52:57 -07:00