As explained in the comment added to the function, in terms of privacy
concerns, it is fine to export all data for these accounts. And it is
important to do - so that exporting an organization which was originally
imported e.g. from Slack doesn't result in excessively limited data for
accounts that were mirror dummies and never "activated" themselves.
Now that we severely limited the way that non-consenting users get
exported, we need to start to consider bots as consenting when
appropriate - otherwise the exported bot accounts will be unusable after
importing.
An administrator shouldn't be able to bypass a user's setting to hide
their email address from everyone, including admins.
Therefore, we should overwrite the delivery_email for such users during
export - unless the user consented to have their private data exported.
The notable consequence of this is that such user accounts will become
completely inaccessible after importing this data to a new server, due
to not having a functional email address on record.
These accounts will only be possible to reclaim via a manual
intervention to change the email address on the `UserProfile` by server
administrators.
This allows us to get rid of the call to `get_consented_user_ids` in
`fetch_usermessages`. Now it's only called at the beginning of the
export, eliminating the redundant db query and also resolving the
potential for data consistency issues, if some users change their
consent setting after the export starts.
Now the full export process operates with a single snapshot of these
consenting user ids.
These ids need to be plumbed through via a file rather than normal arg
passing, because this is a separate management command, run in
subprocesses during the export.
These users didn't consent to having their private data exported.
Therefore, correct handling of these users should involve scrubbing
their settings to just match the realm defaults.
Instead of making repeated calls to get_consented_user_ids, we can just
fetch it (mostly) once and put it in
`context["exportable_user_ids"]`. This is essentially what the
(unused until now) exportable_user_ids logic was added for after all.
The added, intended, effect of this is that non-consenting users will
now get exported as mirror dummy accounts, due to the handling of
non-exportable users in `custom_fetch_user_profile`.
The remaining additional call to `get_consented_user_ids` is in
`fetch_usermessages`. This one is tricky as this function gets called
in subprocesses via
`zerver/management/commands/export_usermessage_batch.py` management
command invoked by the export process.
It requires passing the `exportable_user_ids` in some other way. This
can be dealt with in upcoming commits.
We shouldn't export the entire Client table - it includes Clients for
all the realms on the server, completely unrelated to the realm we're
exporting. Since these contain parts of the UserAgents used by the
users, we should treat these as private data and only export the Clients
that the specific data we're exporting "knows" about.
This refactors `parse_migration_status` to copy the algorithm of
Django's `showmigrations` command instead of parsing its output. This is
done so that the code is not susceptible to breaking changes if Django
modifies showmigrations's implementation.
The previous `parse_migration_status` logic has been repurposed into a
test utility function (`prase_showmigrations`). It is used to verify
that the new `parse_migration_status` generates output identical to the
actual `showmigrations` command.
The `test_clean_up_migration_status_json` is removed because
`test_parse_migration_status` has covered that behavior.
This table's export and import weren't working:
1. It didn't have a Config in export.py, so it wasn't exported at all.
2. Its `date_created` wasn't registered in `DATE_FIELDS`.
3. It wasn't registered in `ID_MAPS` in import_realm.py, so having any
SavedSnippets in the export would cause the import to fail with an
exception.
4. It was missing a `fix_datetime_fields` call in its import codepath.
Without this change, the child tables of UserProfile didn't get their
objects exported if those objects were tied to a mirror dummy user.
For example, a `Recipient` of type `PERSONAL`, or the associated
`Subscription` would not get exported. Same for other tables with
foreign keys to `UserProfile` - such as `UserPresence`.
This happened because the Configs for the export are defined as follows:
```python
user_profile_config = Config(
custom_tables=[
"zerver_userprofile",
"zerver_userprofile_mirrordummy",
],
# set table for children who treat us as normal parent
table="zerver_userprofile",
virtual_parent=realm_config,
custom_fetch=custom_fetch_user_profile,
)
user_subscription_config = Config(
table="_user_subscription",
model=Subscription,
normal_parent=user_profile_config,
filter_args={"recipient__type": Recipient.PERSONAL},
include_rows="user_profile_id__in",
)
Config(
table="_user_recipient",
model=Recipient,
virtual_parent=user_subscription_config,
id_source=("_user_subscription", "recipient"),
)
```
while in `export_from_config` we have:
```python
elif config.normal_parent:
# In this mode, our current model is figuratively Article,
# and normal_parent is figuratively Blog, and
# now we just need to get all the articles
# contained by the blogs.
model = config.model
assert parent is not None
assert parent.table is not None
assert config.include_rows is not None
parent_ids = {r["id"] for r in response[parent.table]}
```
This meant that when processing a table with
`normal_parent=user_profile_config`, the `parent_ids` above would only
have the ids of `UserProfile` objects under the `zerver_userprofile` key in
the exported data - completely missing those in
`zerver_userprofile_mirrordummy`.
The current `get_migration_by_app` has a rather naive approach to
compiling the migration status of a realm, which has led to issues like
#32826. Specifically, those flaws are:
- it does not report the complete state of the migration status of the
exporting servers, only the applied migration.
- it shows both the replaced and the squashed migrations. This would be
a problem if we decide to clean up old migration files we've
squashed(replaced) and import a slightly older realm with those still in
disk. `check_migration_status` would complain of incompatibility even
though those migration files don't matter (they are replaced, after
all).
- it does not clean up ancient/stale applied migrations (for reference,
see how `check-database-compatibility` cleans those)
This commit attempts to write a better `migration_status.json` by
parsing the output of `showmigrations` instead.
This is because Django's `showmigrations` has a lot more logic and
validations baked into it than previously thought. Ones that we care
about are:
- it does validations to make sure app names are valid
- it doesn't list replaced migrations and only squashed one
- it takes into account migrations in disk(`MigrationsLoader`) vs
applied migrations (`MigrationsRecorder`)
Which would resolve the first two points highlighted above.
This commit adds `parse_migration_status`, which takes in the string
output of `showmigrations` and parse it into key-value pair of installed
apps and a list of its migration status.
This is a prep commit to rework the check migrations function of
import/export which will parse the output of `showmigrations` to write
the `migration_status.json` file.
This commit allows users to be assigned to custom groups when
inviting them to join Zulip, similar to how channels are handled.
The implementation follows a similar pattern for adding pills,
ensuring consistency, as user groups and channels are parallel
in nature.
Fixes#24365.
This commit removes the 'email_token' field from the
'Stream' model, introduces a new model 'ChannelEmailAddress',
and backfills it.
This is a prep work towards our plan to generate unique
channel emails for different users, which can be used
to check post permissions at message send time.
This commit updates the export process to write the migration status of
the realm as a JSON file to be included in the export tarball.
This is a preparatory step for adding an assertion to ensure that the
importing and exporting realms have a compatible set of applied
migrations.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.
There are a few fundamental issues that the implementation needs to
solve.
1. The images come from an untrusted source and therefore we don't want
to just pass them through to thumbnailing without checking. For that
reason, we cannot just import ImageAttachment rows from the export
data, even for zulip=>zulip imports.
The right way to process images is to pass them to maybe_thumbail(),
which runs libvips_check_image() on them to verify we're okay with
thumbnailing, creates ImageAttachment rows for them and sends them
to the thumbnailing queue worker. This approach lets us handle both
zulip=>zulip and 3rd party=>zulip imports in the same way,
2. There is a somewhat circular dependency between the Message,
Attachment and ImageAttachment import process:
- ImageAttachments would ideally be created after importing
Attachments, but they need to already exist at the time of Message
import. Otherwise, the markdown processor doesn't know it has to add
HTML for image previews to messages that reference images. This would
mean that messages imported from 3rd party tools don't get image
previews.
- Attachments only get created after Message import however, due to the
many-to-many relationship between Message and Attachment.
This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.
While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:
1b47134d0d/zerver/views/upload.py (L333-L342)
Earlier, only public data export was possible via `POST /export/realm`
endpoint. This commit adds support to create full data export with
member consent via that endpoint.
Also, this adds a 'export_type' parameter to the dictionaries
in `realm_export` event type and `GET /export/realm` response.
Fixes part of #31201.
Earlier, we used to store the key data related to realm exports
in RealmAuditLog. This commit adds a separate table to store
those data.
It includes the code to migrate the concerned existing data in
RealmAuditLog to RealmExport.
Fixes part of #31201.
This commit updates code to store the realm export stats in
json format instead of plain text.
This will help in storing the stats as JsonField in RealmExport table.
For exporting full with consent:
* Earlier, a message advertising users to react with thumbs up
was sent and later used to determine the users who consented.
* Now, we no longer need to send such a message. This commit
updates the logic to use `allow_private_data_export` user-setting
to determine users who consented.
Fixes part of #31201.
This allows finer-grained access control and auditing. The links
generated also expire after one week, and the suggested configuration
is that the underlying data does as well.
Co-authored-by: Prakhar Pratyush <prakhar@zulip.com>
This commit introduced 'creator' and 'date_created'
fields in user groups, allowing users to view who
created the groups and when.
Both fields can be null for groups without creator data.
The export tool was only exporting the already-thumbnailed emoji file,
omitting the original one. Now we make sure to export the .original file
too, like we do for avatars, and make the import tool process it
directly, to thumbnail it directly and generate a still in the case of
animated emojis.
Otherwise, the imported realm wouldn't have the <emoji>.png.original
file that we generally expect to have accessible, and stills for
animated emojis were completely missing.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.
A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
Earlier, the export tool was logging a warning:
"??? NO DATA EXPORTED FOR TABLE zerver_onboardingusermessage!!!"
This bug was due to not configuring a Config object for
'OnboardingUserMessage' in 'get_realm_config()'.
This commit fixes the bug to export the table properly.
When using the sub-expression purely for filtering, and not for
accessing the value in the resultset, .alias() is potentially faster
since it does not pull the value in as well.
This commit renames the "Huddle" Django model class to
"DirectMessageGroup", while maintaining the same table --
"zerver_huddle".
Fixes part of #28640.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable. Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.
This requires plumbing the current (or future) avatar version through
various parts of the upload process.
Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).
We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.
Fixes: #12852.
This prep commit adds a new OnboardingUserMessage model
that will be used to mark the new onboarding messages
for new users as unread and the first message of each
onboarding topic as starred.
This table won't include the old onboarding messages.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.
It also renames variables and methods of type -
"huddle_message" to "group_direct_message".
This is a part of #28640
Earlier, we were creating RealmAuditLog with REALM_EXPORTED
event_type when export of public data took place via organization
settings panel.
We were not creating the audit log when the export was executed
via shell i.e './manage.py export'.
This commit creates the audit log in that case too. It will
help during import to distinguish readily between imports
from another Zulip server vs imports from another product.
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.
Fixes part of #28640.
Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>