Commit Graph

409 Commits

Author SHA1 Message Date
Mateusz Mandera
ff876d2df4 export: Treat is_mirror_dummy=True users as consenting.
As explained in the comment added to the function, in terms of privacy
concerns, it is fine to export all data for these accounts. And it is
important to do - so that exporting an organization which was originally
imported e.g. from Slack doesn't result in excessively limited data for
accounts that were mirror dummies and never "activated" themselves.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
3c43603607 export: Treat deactivated user with consent enabled as consenting.
Prior to this, deactivated user were presumed to be non-consenting to
private data export, regardless of their setting.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
3c1fae1707 export: Fix get_consented_user_ids to also account for bots.
Now that we severely limited the way that non-consenting users get
exported, we need to start to consider bots as consenting when
appropriate - otherwise the exported bot accounts will be unusable after
importing.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
e57b6719fa export: Scrub RealmAuditLog rows where modified_user is non-consenting. 2025-03-28 16:52:44 -07:00
Mateusz Mandera
9da4eeaa94 export: Don't export real email of users unless accessible to admins.
An administrator shouldn't be able to bypass a user's setting to hide
their email address from everyone, including admins.
Therefore, we should overwrite the delivery_email for such users during
export - unless the user consented to have their private data exported.

The notable consequence of this is that such user accounts will become
completely inaccessible after importing this data to a new server, due
to not having a functional email address on record.

These accounts will only be possible to reclaim via a manual
intervention to change the email address on the `UserProfile` by server
administrators.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
13303fd916 export: Plumb consented_user_ids to export_usermessage_batch in a file.
This allows us to get rid of the call to `get_consented_user_ids` in
`fetch_usermessages`. Now it's only called at the beginning of the
export, eliminating the redundant db query and also resolving the
potential for data consistency issues, if some users change their
consent setting after the export starts.

Now the full export process operates with a single snapshot of these
consenting user ids.

These ids need to be plumbed through via a file rather than normal arg
passing, because this is a separate management command, run in
subprocesses during the export.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
747e73470e export: Reset settings to default for users not in exportable_user_ids.
These users didn't consent to having their private data exported.
Therefore, correct handling of these users should involve scrubbing
their settings to just match the realm defaults.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
ceb32a7285 export: Use exportable_user_ids arg to plumb through consenting users.
Instead of making repeated calls to get_consented_user_ids, we can just
fetch it (mostly) once and put it in
`context["exportable_user_ids"]`. This is essentially what the
(unused until now) exportable_user_ids logic was added for after all.

The added, intended, effect of this is that non-consenting users will
now get exported as mirror dummy accounts, due to the handling of
non-exportable users in `custom_fetch_user_profile`.

The remaining additional call to `get_consented_user_ids` is in
`fetch_usermessages`. This one is tricky as this function gets called
in subprocesses via
`zerver/management/commands/export_usermessage_batch.py` management
command invoked by the export process.
It requires passing the `exportable_user_ids` in some other way. This
can be dealt with in upcoming commits.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
8b9516fb0b export: Only export Client objects needed by the data being exported.
We shouldn't export the entire Client table - it includes Clients for
all the realms on the server, completely unrelated to the realm we're
exporting. Since these contain parts of the UserAgents used by the
users, we should treat these as private data and only export the Clients
that the specific data we're exporting "knows" about.
2025-03-28 16:52:44 -07:00
Mateusz Mandera
3a0de29f5d export: Don't export miscellaneous private data of non-consenting users. 2025-03-28 16:52:43 -07:00
PieterCK
719f8db654 migration_status: Refactor parse_migration_status.
This refactors `parse_migration_status` to copy the algorithm of
Django's `showmigrations` command instead of parsing its output. This is
done so that the code is not susceptible to breaking changes if Django
modifies showmigrations's implementation.

The previous `parse_migration_status` logic has been repurposed into a
test utility function (`prase_showmigrations`). It is used to verify
that the new `parse_migration_status` generates output identical to the
actual `showmigrations` command.

The `test_clean_up_migration_status_json` is removed because
`test_parse_migration_status` has covered that behavior.
2025-03-20 10:57:54 -07:00
Mateusz Mandera
c031cf9275 import: Fix export/import of SavedSnippet.
This table's export and import weren't working:
1. It didn't have a Config in export.py, so it wasn't exported at all.
2. Its `date_created` wasn't registered in `DATE_FIELDS`.
3. It wasn't registered in `ID_MAPS` in import_realm.py, so having any
   SavedSnippets in the export would cause the import to fail with an
   exception.
4. It was missing a `fix_datetime_fields` call in its import codepath.
2025-03-10 13:07:56 -07:00
Mateusz Mandera
acb1731bf9 export: Fix export of child tables when exporting mirror dummy users.
Without this change, the child tables of UserProfile didn't get their
objects exported if those objects were tied to a mirror dummy user.

For example, a `Recipient` of type `PERSONAL`, or the associated
`Subscription` would not get exported. Same for other tables with
foreign keys to `UserProfile` - such as `UserPresence`.

This happened because the Configs for the export are defined as follows:

```python
user_profile_config = Config(
    custom_tables=[
        "zerver_userprofile",
        "zerver_userprofile_mirrordummy",
    ],
    # set table for children who treat us as normal parent
    table="zerver_userprofile",
    virtual_parent=realm_config,
    custom_fetch=custom_fetch_user_profile,
)

user_subscription_config = Config(
    table="_user_subscription",
    model=Subscription,
    normal_parent=user_profile_config,
    filter_args={"recipient__type": Recipient.PERSONAL},
    include_rows="user_profile_id__in",
)

Config(
    table="_user_recipient",
    model=Recipient,
    virtual_parent=user_subscription_config,
    id_source=("_user_subscription", "recipient"),
)
```

while in `export_from_config` we have:
```python
    elif config.normal_parent:
        # In this mode, our current model is figuratively Article,
        # and normal_parent is figuratively Blog, and
        # now we just need to get all the articles
        # contained by the blogs.
        model = config.model
        assert parent is not None
        assert parent.table is not None
        assert config.include_rows is not None
        parent_ids = {r["id"] for r in response[parent.table]}
```

This meant that when processing a table with
`normal_parent=user_profile_config`, the `parent_ids` above would only
have the ids of `UserProfile` objects under the `zerver_userprofile` key in
the exported data - completely missing those in
`zerver_userprofile_mirrordummy`.
2025-03-10 13:07:56 -07:00
PieterCK
2d6426100f import-export: Rework how we write migration_status.json.
The current `get_migration_by_app` has a rather naive approach to
compiling the migration status of a realm, which has led to issues like
#32826. Specifically, those flaws are:

- it does not report the complete state of the migration status of the
exporting servers, only the applied migration.
- it shows both the replaced and the squashed migrations. This would be
a problem if we decide to clean up old migration files we've
squashed(replaced) and import a slightly older realm with those still in
disk. `check_migration_status` would complain of incompatibility even
though those migration files don't matter (they are replaced, after
all).
- it does not clean up ancient/stale applied migrations (for reference,
see how `check-database-compatibility` cleans those)

This commit attempts to write a better `migration_status.json` by
parsing the output of `showmigrations` instead.

This is because Django's `showmigrations` has a lot more logic and
validations baked into it than previously thought. Ones that we care
about are:

- it does validations to make sure app names are valid
- it doesn't list replaced migrations and only squashed one
- it takes into account migrations in disk(`MigrationsLoader`) vs
applied migrations (`MigrationsRecorder`)

Which would resolve the first two points highlighted above.
2025-01-24 17:08:37 -08:00
PieterCK
4db7ea2296 migration_status: Add parse_migration_status.
This commit adds `parse_migration_status`, which takes in the string
output of `showmigrations` and parse it into key-value pair of installed
apps and a list of its migration status.

This is a prep commit to rework the check migrations function of
import/export which will parse the output of `showmigrations` to write
the `migration_status.json` file.
2025-01-24 17:08:37 -08:00
sujal shah
771d3b1434 invites: Enable adding users to user groups during invitations.
This commit allows users to be assigned to custom groups when
inviting them to join Zulip, similar to how channels are handled.
The implementation follows a similar pattern for adding pills,
ensuring consistency, as user groups and channels are parallel
in nature.

Fixes #24365.
2024-11-26 11:26:34 -08:00
Prakhar Pratyush
175104ea01 streams: Add 'ChannelEmailAddress' model.
This commit removes the 'email_token' field from the
'Stream' model, introduces a new model 'ChannelEmailAddress',
and backfills it.

This is a prep work towards our plan to generate unique
channel emails for different users, which can be used
to check post permissions at message send time.
2024-11-21 14:53:28 -08:00
PieterCK
40bcb4b42b export: Add migration status file to export tarball.
This commit updates the export process to write the migration status of
the realm as a JSON file to be included in the export tarball.

This is a preparatory step for adding an assertion to ensure that the
importing and exporting realms have a compatible set of applied
migrations.
2024-11-08 15:52:45 -08:00
Mateusz Mandera
da4443f392 thumbnail: Make thumbnailing work with data import.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.

There are a few fundamental issues that the implementation needs to
solve.

1. The images come from an untrusted source and therefore we don't want
   to just pass them through to thumbnailing without checking. For that
   reason, we cannot just import ImageAttachment rows from the export
   data, even for zulip=>zulip imports.
   The right way to process images is to pass them to maybe_thumbail(),
   which runs libvips_check_image() on them to verify we're okay with
   thumbnailing, creates ImageAttachment rows for them and sends them
   to the thumbnailing queue worker. This approach lets us handle both
   zulip=>zulip and 3rd party=>zulip imports in the same way,

2. There is a somewhat circular dependency between the Message,
   Attachment and ImageAttachment import process:

- ImageAttachments would ideally be created after importing
  Attachments, but they need to already exist at the time of Message
  import. Otherwise, the markdown processor doesn't know it has to add
  HTML for image previews to messages that reference images. This would
  mean that messages imported from 3rd party tools don't get image
  previews.
- Attachments only get created after Message import however, due to the
  many-to-many relationship between Message and Attachment.

This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.

While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:

1b47134d0d/zerver/views/upload.py (L333-L342)
2024-10-24 10:32:51 -07:00
Prakhar Pratyush
55f97cd06f realm_export: Add support to create full data export via /export/realm.
Earlier, only public data export was possible via `POST /export/realm`
endpoint. This commit adds support to create full data export with
member consent via that endpoint.

Also, this adds a 'export_type' parameter to the dictionaries
in `realm_export` event type and `GET /export/realm` response.

Fixes part of #31201.
2024-10-11 13:20:42 -07:00
Prakhar Pratyush
07dcee36b2 export_realm: Add RealmExport model.
Earlier, we used to store the key data related to realm exports
in RealmAuditLog. This commit adds a separate table to store
those data.

It includes the code to migrate the concerned existing data in
RealmAuditLog to RealmExport.

Fixes part of #31201.
2024-10-04 12:06:35 -07:00
Prakhar Pratyush
5d9eb4e358 realm_export: Save stats in '.json' format instead of '.txt'.
This commit updates code to store the realm export stats in
json format instead of plain text.

This will help in storing the stats as JsonField in RealmExport table.
2024-10-04 12:06:35 -07:00
Anders Kaseorg
184c0203f3 upload: Lazily import boto3.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-24 16:38:37 -07:00
Vector73
9e4e85e140 saved_snippets: Add backend for saved snippets.
Part of #31227.
2024-09-24 15:27:58 -07:00
Prakhar Pratyush
65f465562f export_realm: Remove the 'react on consent message' approach.
For exporting full with consent:

* Earlier, a message advertising users to react with thumbs up
  was sent and later used to determine the users who consented.

* Now, we no longer need to send such a message. This commit
  updates the logic to use `allow_private_data_export` user-setting
  to determine users who consented.

Fixes part of #31201.
2024-09-24 14:32:42 -07:00
Alex Vandiver
ce0df00e44 export: Notify all realm admins on realm export. 2024-09-23 10:02:43 -07:00
Alex Vandiver
7afe6800f7 export: Use relative paths, include more data. 2024-09-23 10:02:43 -07:00
Alex Vandiver
e125ad823d exports: Add a separate bucket for realm exports.
This allows finer-grained access control and auditing.  The links
generated also expire after one week, and the suggested configuration
is that the underlying data does as well.

Co-authored-by: Prakhar Pratyush <prakhar@zulip.com>
2024-09-20 15:43:49 -07:00
Alex Vandiver
91ac5c3c8b export: Log before the compression step, which can be slow. 2024-09-20 15:43:49 -07:00
sujal shah
614caf111e user_groups: Add creator and date_created field in user groups.
This commit introduced 'creator' and 'date_created'
fields in user groups, allowing users to view who
created the groups and when.

Both fields can be null for groups without creator data.
2024-09-13 18:44:58 -07:00
Lauryn Menard
d2c32f23db audit-log: Move realm event types to AuditLogEventType enum.
Event types moved: REALM_DEACTIVATED, REALM_REACTIVATED, REALM_SCRUBBED
REALM_PLAN_TYPE_CHANGED, REALM_LOGO_CHANGED, REALM_EXPORTED
REALM_PROPERTY_CHANGED, REALM_ICON_SOURCE_CHANGED, REALM_DISCOUNT_CHANGED
REALM_SPONSORSHIP_APPROVED, REALM_BILLING_MODALITY_CHANGED
REALM_REACTIVATION_EMAIL_SENT, REALM_SPONSORSHIP_PENDING_STATUS_CHANGED
REALM_SUBDOMAIN_CHANGED
2024-09-09 11:50:13 -07:00
Mateusz Mandera
5476340b52 import: Export and import .original emoji files correctly.
The export tool was only exporting the already-thumbnailed emoji file,
omitting the original one. Now we make sure to export the .original file
too, like we do for avatars, and make the import tool process it
directly, to thumbnail it directly and generate a still in the case of
animated emojis.

Otherwise, the imported realm wouldn't have the <emoji>.png.original
file that we generally expect to have accessible, and stills for
animated emojis were completely missing.
2024-08-21 16:30:19 -07:00
roanster007
7b3e163d55 refactor: Rename huddle to direct_message_group in non api files.
This commit completes rename of "huddle" to "direct_message_group"
in all the non API files.

Part of #28640
2024-07-31 23:25:56 -07:00
Alex Vandiver
2e38f426f4 upload: Generate thumbnails when images are uploaded.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.

A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
2024-07-16 13:22:15 -07:00
Prakhar Pratyush
7d379e00b0 export: Fix 'OnboardingUserMessage' table not being exported.
Earlier, the export tool was logging a warning:
"??? NO DATA EXPORTED FOR TABLE zerver_onboardingusermessage!!!"

This bug was due to not configuring a Config object for
'OnboardingUserMessage' in 'get_realm_config()'.

This commit fixes the bug to export the table properly.
2024-07-16 09:36:02 -07:00
Anders Kaseorg
1e9b6445a9 ruff: Fix PLR6104 Use += to perform an augmented assignment directly.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:49:51 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver
d5a4941691 django: Switch to .alias() instead .annotate() where possible.
When using the sub-expression purely for filtering, and not for
accessing the value in the resultset, .alias() is potentially faster
since it does not pull the value in as well.
2024-07-11 09:26:23 -07:00
roanster007
02d0566dc5 refactor: Rename Huddle Django model class to DirectMessageGroup.
This commit renames the "Huddle" Django model class to
"DirectMessageGroup", while maintaining the same table --
"zerver_huddle".

Fixes part of #28640.
2024-07-07 21:31:30 -07:00
Alex Vandiver
e29a455b2d avatars: Encode version into the filename.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable.  Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.

This requires plumbing the current (or future) avatar version through
various parts of the upload process.

Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).

We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.

Fixes: #12852.
2024-07-07 14:40:07 -07:00
Prakhar Pratyush
fb836a4f0a onboarding: Add 'OnboardingUserMessage' model.
This prep commit adds a new OnboardingUserMessage model
that will be used to mark the new onboarding messages
for new users as unread and the first message of each
onboarding topic as starred.

This table won't include the old onboarding messages.
2024-07-05 15:39:32 -07:00
roanster007
52692a6448 refactor: Rename huddle to direct_message_group in non API.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.

It also renames variables and methods of type -
"huddle_message" to "group_direct_message".

This is a part of #28640
2024-07-04 07:56:31 -07:00
Mateusz Mandera
b1d50b511c presence: Handle PresenceSequence in the export/import system. 2024-06-02 22:08:28 -07:00
Prakhar Pratyush
cc793612f0 export: Create REALM_EXPORTED audit log for exports via shell.
Earlier, we were creating RealmAuditLog with REALM_EXPORTED
event_type when export of public data took place via organization
settings panel.

We were not creating the audit log when the export was executed
via shell i.e './manage.py export'.

This commit creates the audit log in that case too. It will
help during import to distinguish readily between imports
from another Zulip server vs imports from another product.
2024-05-08 16:16:37 -07:00
Sahil Batra
c9a7c13ea7 import: Add code to support import and export of NamedUserGroups. 2024-04-26 17:03:09 -07:00
Sahil Batra
71b601cf5a groups: Create NamedUserGroup table. 2024-04-26 17:03:09 -07:00
Anders Kaseorg
a82a3eb4d7 ruff: Fix UP033 Use @functools.cache.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-04-01 18:32:52 -07:00
John Lu
a5cf0ec526 refactor: Replace HUDDLE with DIRECT_MESSAGE_GROUP.
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.

Fixes part of #28640.

Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
2024-03-21 16:39:33 -07:00