Commit Graph

54 Commits

Author SHA1 Message Date
Steve Howell
ffec98d85c Annotate zerver/lib/export.py.
This adds the remaining annotations to lib/export.py.
2016-09-12 08:21:46 -07:00
Steve Howell
62dd86bcce mypy: Set Path to str instead of text_type in export.py.
Using text_type for Path just breaks a lot of calls to
the core Python libraries that still want strings.
2016-09-12 08:21:46 -07:00
Tim Abbott
f0a65fe52a export_messages_single_user: Fix missing order_by.
This fixes a nasty bug where exporting messages sent by a single user
might only contain some of the messages in the event that the
unspecified sort order by the database didn't happen to be sorted by
message ID.
2016-08-17 22:18:58 -07:00
Steve Howell
c12bd853f7 export: Add basic export tests. (fixes #1584) 2016-08-16 13:38:37 -07:00
Steve Howell
ab053845aa export: Fetch messages with two queries. 2016-08-16 13:38:37 -07:00
Steve Howell
a5f651b81a export: Add some user_id filtering to do_export_realm().
This commit only addresses tables that currently derive from
user_profile_config in get_realm_config:

   zerver_userpresence
   zerver_useractivity
   zerver_useractivityinterval
   zerver_subscription
   zerver_recipient
   zerver_stream
   zerver_huddle

It also introduces an entry in realm.json for a virtual
table called "zerver_userprofile_mirrordummy" for dummy users,
which include prior dummy users and users excluded from the call
to do_export_realm().

Note that this feature is not yet exposed in the management command.
2016-08-16 13:38:37 -07:00
Tim Abbott
4a46b879ee import_uploads_s3: Fix setting of content-type. 2016-08-13 11:26:53 -07:00
Tim Abbott
8edc5110cd export: Set re_map_foreign_keys verbose default to False.
Otherwise, this is super spammy when doing a large import.
2016-08-13 11:26:16 -07:00
Tim Abbott
8b170665e4 export: Add assertion that ./manage.py exists in current directory.
Otherwise we'll fail starting the UserMessage export, later on.
2016-08-13 10:13:11 -07:00
Tim Abbott
856ab48ec6 export: Fix stream export sanity check. 2016-08-13 10:02:34 -07:00
Steve Howell
e932b03999 export: Clean up path.join/makedirs for avatars/uploads. 2016-08-13 09:31:49 -07:00
Steve Howell
12c25e7d5c export: Filter attachments by message id. 2016-08-13 09:31:49 -07:00
Steve Howell
0f493d5000 export: Return msg ids from export_partial_message_files(). 2016-08-13 09:31:49 -07:00
Steve Howell
0c3e98fa91 export: Introduce attachment.json file.
Now attachment data gets written to its own json file.  We are
splitting this out so that will be easier for us to cross-check
attachments against messages without holding up writing a lot
of the other realm data.  (message cross-checking is coming soon)
2016-08-12 18:59:14 -07:00
Steve Howell
ea0a7d87c8 export: Refactor how we fetch attachment data.
This commit doesn't change any behavior; it just moves fetching
attachments out of the Config scheme and into its own method.
This prepares us to start writing attachment data to its own
file and cross-checking against message ids (coming soon).
2016-08-12 18:59:14 -07:00
Steve Howell
fba7a9ca21 export: Unify top-down export configuration.
We now just have a single configuration get_realm_config() that
handles most of the top-down realm export tables.  (It basically
does everything not related to messages or uploads/avatars.)

Unifying the configs allows us to be more strict in our
configuration about checking for anomalies.  In the future
we may need to loosen up some of those restrictions again,
but for now we are picky and paranoid.
2016-08-12 15:27:23 -07:00
Steve Howell
5a5353b846 export: Fetch stream data only for stream recipients.
Fetch stream data only for stream recipients, instead of
getting streams via realm_id.

(This change is kind of moot for now, since our stream recipients
include all possible stream recipients in the realm, but this
sets us up for when we start restricting users that we export
within the realm.)
2016-08-12 15:27:22 -07:00
Steve Howell
7a429d1e30 export: Add sanity_check_stream_data(). 2016-08-12 15:27:22 -07:00
Steve Howell
ec86e475b4 export: Add Config.post_process_data 2016-08-12 15:12:01 -07:00
Steve Howell
0c2c331905 export: Flip how we fetch stream subscription data.
We now get stream subscriptions BEFORE stream recipients.
2016-08-12 15:12:01 -07:00
Steve Howell
70a916aae3 export: Flip how we fetch user subscription data.
We now get user subscriptions BEFORE user recipients.
2016-08-12 15:12:01 -07:00
Steve Howell
2a2ce6ada1 export: Remove hard-to-maintain code comment.
Subsequent changes are gonna make the top-down/bottom-up comment
no longer valid.
2016-08-12 15:12:01 -07:00
Steve Howell
6fdd42c08b export: Create convenient soft links. 2016-08-12 10:48:33 -07:00
Steve Howell
70b68ddcc3 export: Use a config for export_single_user(). 2016-08-12 10:37:41 -07:00
Steve Howell
c69a5bdec3 export: Handle more tables via export_from_config().
This commit introduces the ability to do custom fetches
and to essentially use temp tables for intermediate results.

(The temp table stuff deals with recipients/subscriptions
having three different flavors--user, stream, and huddle.)
2016-08-12 10:37:35 -07:00
Steve Howell
f471a1779e export: Handle simple exports with export_from_config().
This handles the simple tables that don't need custom fetches.
2016-08-12 09:54:57 -07:00
Steve Howell
682155778d export: Add export_with_config().
Subsequent commits will start to use this.
2016-08-12 09:54:57 -07:00
Steve Howell
b0e6d20321 export: Write stats.txt for ./manage.py export <realm>. 2016-08-12 09:06:10 -07:00
Steve Howell
df3aa39be3 export: Extract write_data_to_file(). 2016-08-11 15:51:22 -07:00
Steve Howell
f29b32bbb2 export: Clarify message exporting code.
The function to create the message partial files has been
renamed to export_partial_message_files().  It now gets its own
list of user profile ids and recipient ids from the response,
so that we can de-clutter do_export_realm().
2016-08-11 15:51:22 -07:00
Steve Howell
5cd915694a export: Extract launch_user_message_subprocesses().
This is the last in a series of commits that makes it
so that do_export_realm() mostly delegates work out
to other functions.
2016-08-11 15:21:30 -07:00
Steve Howell
b383f5ca5d export: Extract fetch_user_profile_cross_realm(). 2016-08-11 15:21:30 -07:00
Steve Howell
fee2106c6f export: Extract fetch_huddle_objects().
This also removes the dead codepath for include_private=False.
2016-08-11 15:21:30 -07:00
Steve Howell
a6235f6a60 export: Add comments to export_single_user().
(This is a bit of a prefactoring to hopefully create a nice
diff in a subsequent commit.)
2016-08-11 15:21:30 -07:00
Steve Howell
6e7fe76cf4 export: s/avatar_bucket/processing_avatars
The name avatar_bucket was confusing for a boolean, and
in some places it was used for non-S3 paths.

I considered the more concise 'is_avatar', but that
was still confusing when you are processing multiple
files, because you think it's a calculated property
on one file instead of an overall codepath switch.

I also considered splitting up some functions, but
there is a lot of common logic between handling
file uploads and avatars that's not trivial to extract
into helpers, especially on the S3 side.
2016-08-11 15:21:30 -07:00
Steve Howell
3dab366733 export: Clean up names of upload/avatar export functions.
I did some minor moving around of code that made us have
one fewer function without any additional conditional
logic. The names are more explicit about saying
"from_local" and "from_s3".  Also, there is less clutter
now in do_export_realm(), which is evolving into more of
a dispatcher and less of a worker.
2016-08-11 15:21:30 -07:00
Steve Howell
d62a351107 export: Add sanity_check_output(). 2016-08-11 15:21:30 -07:00
Steve Howell
06b0df5efc export: Remove spurious select_related() call for Client. 2016-08-10 14:16:17 -07:00
Steve Howell
cb59a11f0a export: Extract get_primary_ids(). 2016-08-10 14:16:17 -07:00
Steve Howell
90e9083b81 export: Extract filter_by_realm(). 2016-08-10 14:16:17 -07:00
Steve Howell
4b6b1b8ad4 export: Extract filter_by_users(). 2016-08-10 14:16:17 -07:00
Steve Howell
db9edfce34 export: Use DATE_FIELDS in fix_datetime_fields().
Now we only call this once per table and use DATE_FIELDS to
look up the data fields.
2016-08-10 14:16:17 -07:00
Steve Howell
35c59fc4d7 export: Clean up export_messages().
This is pretty minor cleanup, but it makes it a little more
explicit what we're writing to the shard file, and it allows
us to use a more specific mypy type when calling
floatify_datetime_fields.
2016-08-10 14:16:17 -07:00
Steve Howell
1d1f36c0b8 export: Always use subprocesses to export UserMessage.
We no longer have an in-process code path to export
UserMessage rows.  We want to only maintain the
subprocess code, which we'll always use in production,
and which will work fine in dev.
2016-08-10 14:16:17 -07:00
Steve Howell
78bbefbf94 export: Create import_attachments. 2016-08-10 14:16:17 -07:00
Steve Howell
7ec6a394fe export: Filter Attachment objects by realm. 2016-08-09 16:47:14 -07:00
Steve Howell
cecfaa7761 export: Extract import_message_data(). 2016-08-09 16:47:14 -07:00
Steve Howell
5386ed280e export: Extract update_id_map().
We also use a vanilla dictionary instead of a defaultdict, so
that we explicitly initialize what tables are being re-mapped.
2016-08-09 16:47:14 -07:00
Steve Howell
217ef8a4d2 export: Split fix_foreign_keys() into two functions.
We now have convert_to_id_fields for the simple case, and
re_map_foreign_keys for the more complex case. I also
renamed some parameters and variables.
2016-08-09 16:47:14 -07:00
Steve Howell
dd88ffccfd export: Extract make_raw() in lib/export.py. 2016-08-09 15:58:27 -07:00