Commit Graph

297 Commits

Author SHA1 Message Date
Steve Howell
f810833df5 export: Improve export_usermessages_batch.
We no longer jankily read our input file into
an "output" variable.  Instead, we do things
in a type-safe way.
2021-12-09 08:36:40 -08:00
Steve Howell
5c1e8cb8dc mypy: Add MessagePartial TypedDict. 2021-12-09 08:36:40 -08:00
Steve Howell
09c57a3f9f export: Log more consistently and sort ids.
Now all file writes go through our three
helper functions, and we consistently
write a single log message after the file
gets written.

I killed off write_message_exports, since
all but one of its callers can call
write_table_data, which automatically
sorts data.  In particular, our Message
and UserMessage data will now be sorted
by ids.
2021-12-09 08:36:40 -08:00
Steve Howell
6ec49951c6 minor: Avoid creating intermediate list for message_ids.
This probably just postpones the list creation until
Django builds the "IN" query, but semantically it's
good to work in sets where we don't have any
meaningful ordering of the list that gets used.
2021-12-08 16:12:54 -08:00
Steve Howell
f8ed099d3c export: Sort table data for most tables.
This affects most of our tables, but it excludes
table(s) like Message that go through kind of
unique codepaths.
2021-12-08 16:12:54 -08:00
Steve Howell
a1d3f12e53 refactor: Extract write_table_data().
The immediate benefit of this is stronger mypy
checks (avoiding the ugly union caused by message
files).

The subsequent commit will add sorting.

We have test coverage on all these lines insofar
as if you comment out the lines, tests will
explode (i.e. more than superficial line
coverage).
2021-12-08 16:12:54 -08:00
Steve Howell
c76ca2d0df export: Sort records.json files by path. 2021-12-08 16:12:54 -08:00
Steve Howell
2ef38e3d48 refactor: Extract write_records_json_file. 2021-12-08 16:12:54 -08:00
Steve Howell
b79cfc19ab user export: Broaden query for RealmAuditLog.
We now check acting_user as well as modified_user
to see if a row pertains to our exported user.
2021-12-08 16:01:38 -08:00
Steve Howell
927b04368e minor: Use virtual_parent for custom fetchers.
The distinction here wasn't super meaningful
due to the way we order our "elif" statements,
but we want to reserver "normal_parent" for the
majority of use cases, where you simply tell
the Config what the "foreign_key" is.
2021-12-08 15:58:07 -08:00
Steve Howell
50120a9387 export: Remove config parameter for custom fetchers. 2021-12-08 15:58:07 -08:00
Steve Howell
54a3a423e5 mypy: Fix CustomFetch=Any hack. 2021-12-08 15:58:07 -08:00
Steve Howell
4128b52ac5 export: Rename custom fetchers. 2021-12-08 15:58:07 -08:00
Steve Howell
a2c4931316 exports: Use realm for RealmAuditLog in realm exports.
For realm-wide exports, there is no reason to query
inefficiently against a list of modified users.

We move the Config out of the common child configs.
2021-12-08 15:58:07 -08:00
Steve Howell
8dd3c1038f exports: Rename parent_key to include_rows.
Even though Django usually treats foo__in
and foo_id__in identically for filters where
foo is a ForeignKey type, we want to insist
on somewhat more consistent syntax, because
we have the odd combo of type and type_id
in Recipient, where type_id is kinda like a
foreign key, but not a ForeignKey.

So we assert for now that all our include_rows
values end in "_id__in".
2021-12-08 15:58:07 -08:00
Steve Howell
02207f47d5 minor: Move code blocks to be alphabetical. 2021-12-08 15:58:07 -08:00
Steve Howell
aae9f1b6f5 export: Make Config errors more clear. 2021-12-08 15:58:07 -08:00
Steve Howell
6d09eab285 export: Export file images for single users.
We don't have automated test coverage on this yet,
but below are the results from manual testing.

Note that we include the realm icon and logo even
though they were not created by Cordelia.

    ./manage.py export_single_user cordelia@zulip.com

    $ (cd /tmp/zulip-export-4v3mo802/ && find .)
    .
    ./emoji
    ./emoji/2
    ./emoji/2/emoji
    ./emoji/2/emoji/images
    ./emoji/2/emoji/images/3.jpg
    ./emoji/records.json
    ./messages-000001.json
    ./realm_icons
    ./realm_icons/2
    ./realm_icons/2/night_logo.original
    ./realm_icons/2/night_logo.png
    ./realm_icons/2/icon.png
    ./realm_icons/2/icon.original
    ./realm_icons/records.json
    ./avatars
    ./avatars/2
    ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.original
    ./avatars/2/c5125af0447f4d66ce34c1b32eac75ac27ebe0e7.png
    ./avatars/records.json
    ./uploads
    ./uploads/2
    ./uploads/2/68
    ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt
    ./uploads/2/68/xyEkC5dTIp8m42_6HJ3kBfdt/denver.jpg
    ./uploads/2/96
    ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim
    ./uploads/2/96/ol5WE6RTUntvuPDSpJUrYTim/denver.jpg
    ./uploads/records.json
    ./user.json
2021-12-07 11:16:52 -08:00
Steve Howell
b8d9143318 export: Validate emoji paths.
(We lift the RealmEmoji query to be used by
both local and S3 storage helpers.)
2021-12-07 11:16:52 -08:00
Steve Howell
ef6d9b10d2 refactor: Extract get_emoji_path. 2021-12-07 11:16:52 -08:00
Steve Howell
5a41904201 export: Add handle_system_bots flag.
We will set this to False for single-user exports.
2021-12-07 11:16:52 -08:00
Steve Howell
0e19deb558 exports: Limit s3 upload exports with path_id checks. 2021-12-07 11:16:52 -08:00
Steve Howell
f6cbf931ae refactor: Pass attachments to export_uploads_from_local.
The next commit will use attachments in the s3 path.
2021-12-07 11:16:52 -08:00
Steve Howell
03f40a64d4 refactor: Pass valid_hashes to export_files_from_s3. 2021-12-07 11:16:52 -08:00
Steve Howell
15bc677f35 export: Pass users to export_avatars_from_local. 2021-12-07 11:16:52 -08:00
Steve Howell
42ecabe967 export: Add check_metadata flag. 2021-12-06 15:09:37 -08:00
Steve Howell
0166f13d83 s3 exports: Validate user metadata for all assets.
This preps us to download assets for just a single
user.
2021-12-06 15:09:37 -08:00
Steve Howell
946ab22bba refactor: Lift users query to caller.
This preps us to reuse this code for single users
(after a few more subsequent changes).
2021-12-06 15:09:37 -08:00
Steve Howell
b0e5c1d3b9 export: Remove paranoid assertion.
There are tactical reasons to remove this assertion.

Basically, the reason it's safe to remove is that it's
been around a long time and we would have seen this
operationally. Also, the check to make sure that the
S3 filename thingy matches the avatar hash is a much
stronger check.

We will soon restore a stronger version of this check
that applies to all of our asset types (emojis/avatars/etc.).
2021-12-06 15:09:37 -08:00
Steve Howell
59951ae52b refactor: Move metadata checks for s3 export.
This technically broadens the check for user_profile_id,
but we write that metadata on every record.
2021-12-06 15:09:37 -08:00
Steve Howell
a27a6a4548 refactor: Inline _check_key_metadata.
This was only called in one place.
2021-12-06 15:09:37 -08:00
Steve Howell
db39948be5 refactor: Pass in flavor to export_files_from_s3. 2021-12-06 15:09:37 -08:00
Steve Howell
f4354c896b refactor: Pass object_key into export_files_from_s3.
This makes it easier to read the calling code and see
the big picture of how the four asset types are
organized.

I also handle uploads first, to be similar to the local
code.

This code is well tested--you can modify any of the callers
to pass in a wrong value of `object_key` and get a failing
test.
2021-12-06 15:09:37 -08:00
Steve Howell
4088be6017 import/export: Add UserStatus table.
(We support both realm and single-user exports.)
2021-12-06 13:27:25 -08:00
Steve Howell
45addcd506 cosmetic: Sort DATE_FIELDS in export.py source. 2021-12-06 13:27:25 -08:00
Steve Howell
f83907d3bb export: Add MutedUser table.
Note that the import was already implemented, but
its test was flawed.
2021-12-06 13:27:25 -08:00
Steve Howell
946e8064a3 user exports: Add several tables to the tarball.
We add the following tables to the user export:

    AlertWord
    CustomProfileFieldValue
    RealmAuditLog
    Service
    UserActivity
    UserActivityInterval
    UserCount
    UserGroup
    UserHotspot
    UserPresence
    UserTopic

Except for UserCount, we achieve this by sharing
code with the realm export via
add_user_profile_child_configs.

UserCount is handled slightly differently than realm
exports due to which key we trigger off.

It's possible that RealmAuditLog is incomplete for
single users, since we may also want rows where they
are the acting_user. This commit finds rows where
they are the modified_user. For non-admins I believe
it's rarely the case that they are the actor, and
they will tend to be the modified user if the two
fields are different at all.  For admins it's
arguable we want to see both changes they enacted
as well as changes that affected them.
2021-11-25 08:36:43 -08:00
Shlok Patel
893c9bc896 export: Remove --delete-after-upload flag in realm export.
For export realm following changes have been made:
- `./manage.py export --upload` would delete `.tar.gz` and unpacked dir
- `./manage.py export` would only delete `unpacked dir`

Besides, we have removed `--delete-after-upload` as we have set it as
the default.

Fixes #20081
2021-11-03 11:14:02 -07:00
Mateusz Mandera
73a6f2a1a7 auth: Add support for using SCIM for account management. 2021-10-14 12:29:10 -07:00
Anders Kaseorg
1e5157b66c user_groups: Add a recursive group membership model.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-10-13 10:08:06 -07:00
Abhijeet Prasad Bodas
4455dac424 models: Use default db table name for UserTopic.
Also update the realm import/export code and tests accordingly.
2021-09-17 12:14:28 -07:00
Abhijeet Prasad Bodas
2aea944a7e models: Rename UserTopic.date_muted to last_updated.
This is a follow-up to #19388.

We will in the future allow patch requests to change the visibility
of an existing topic, so `last_updated` is better name for this field.

This commit does not affect the API or events in any way, but only the
database.
2021-09-17 12:14:28 -07:00
sahil839
7d64a9053b models: Ensure every realm has a RealmUserDefault object.
Because we create all realms with do_create_user (including in the
test suite), we just need to change that function, add a migration for
existing realms, and ensure the data import code path correctly
creates these objects.

Note that the import code path will create a RealmUserDefault row with
default values if it is not present in the import data, which is
important for importing data from other tools like Slack.
2021-09-09 10:28:44 -07:00
Anders Kaseorg
4206e5f00b python: Remove locally dead code.
These changes are all independent of each other; I just didn’t feel
like making dozens of commits for them.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
Anders Kaseorg
1bdb7b1141 mypy: Add boto3-stubs.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-09 20:32:19 -07:00
Anders Kaseorg
bfdb2f4628 export: Fix error message generation in _check_key_metadata.
There is no key.name.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-09 20:32:19 -07:00
Abhijeet Prasad Bodas
683c8507e4 models: Remove MutedTopic alias for UserTopic.
Part of #19272
2021-07-28 10:25:54 -07:00
Mateusz Mandera
43329b6a34 bots: Pass realm to get_system_bot calls in export/import. 2021-07-26 15:33:13 -07:00
Tim Abbott
09b5bb7930 export: Improve error message for missing registrations. 2021-07-24 17:36:15 -07:00
PIG208
66b1a4e7ca backend: Add None-checks with assertions and if-elses.
This fixes a batch of mypy errors of the following format:
'Item "None" of "Optional[Something]" has no attribute "abc"'
2021-07-24 15:00:21 -07:00