The only callsite of do_export_realm calls `rmtree` on the output
path, which means this symlink is always dangling. Since realms can
also be exported by end-users, following it would always be a race
condition, anyways.
Remove it.
For get and filter queries of NamedUserGroup, realm_for_sharding
field is used instead of realm field, as directly using
realm_for_sharding field on NamedUserGroup makes the query faster
than using realm present on the base UserGroup table.
The import code wasn't setting subscriber_count at all, resulting in a
value of 0 when dealing with imports from 3rd party apps.
We run this unconditionally, also for imports from Zulip, since this
ensures we won't inherit incorrect values, if the data we're import has
them - e.g. due to some bugs that affected the server the data came from.
A prep PR to 34308.
Explicitly pass the fields to be updated,
This increases performance but most importantly
prevents overwriting the db-saved value of
"subscriber_count" field (added in an upcoming PR)
with the in-memory default value of 0,
since "subscriber_count" will only be updted
via the db.
Migrate some tests to use do_ functions instead of
direclty modifying the state.
Adds detailed tests for the work in the prior commits fixing the
treatment of private data in various tables in exports with consent and
public exports.
This refactors `parse_migration_status` to copy the algorithm of
Django's `showmigrations` command instead of parsing its output. This is
done so that the code is not susceptible to breaking changes if Django
modifies showmigrations's implementation.
The previous `parse_migration_status` logic has been repurposed into a
test utility function (`prase_showmigrations`). It is used to verify
that the new `parse_migration_status` generates output identical to the
actual `showmigrations` command.
The `test_clean_up_migration_status_json` is removed because
`test_parse_migration_status` has covered that behavior.
This commit updates populate_db to mark all onboarding steps as seen
for existing users to avoid unnecessary popups during development.
Developers should create a new user in development environment
to test the onboarding steps.
This table's export and import weren't working:
1. It didn't have a Config in export.py, so it wasn't exported at all.
2. Its `date_created` wasn't registered in `DATE_FIELDS`.
3. It wasn't registered in `ID_MAPS` in import_realm.py, so having any
SavedSnippets in the export would cause the import to fail with an
exception.
4. It was missing a `fix_datetime_fields` call in its import codepath.
Without this change, the child tables of UserProfile didn't get their
objects exported if those objects were tied to a mirror dummy user.
For example, a `Recipient` of type `PERSONAL`, or the associated
`Subscription` would not get exported. Same for other tables with
foreign keys to `UserProfile` - such as `UserPresence`.
This happened because the Configs for the export are defined as follows:
```python
user_profile_config = Config(
custom_tables=[
"zerver_userprofile",
"zerver_userprofile_mirrordummy",
],
# set table for children who treat us as normal parent
table="zerver_userprofile",
virtual_parent=realm_config,
custom_fetch=custom_fetch_user_profile,
)
user_subscription_config = Config(
table="_user_subscription",
model=Subscription,
normal_parent=user_profile_config,
filter_args={"recipient__type": Recipient.PERSONAL},
include_rows="user_profile_id__in",
)
Config(
table="_user_recipient",
model=Recipient,
virtual_parent=user_subscription_config,
id_source=("_user_subscription", "recipient"),
)
```
while in `export_from_config` we have:
```python
elif config.normal_parent:
# In this mode, our current model is figuratively Article,
# and normal_parent is figuratively Blog, and
# now we just need to get all the articles
# contained by the blogs.
model = config.model
assert parent is not None
assert parent.table is not None
assert config.include_rows is not None
parent_ids = {r["id"] for r in response[parent.table]}
```
This meant that when processing a table with
`normal_parent=user_profile_config`, the `parent_ids` above would only
have the ids of `UserProfile` objects under the `zerver_userprofile` key in
the exported data - completely missing those in
`zerver_userprofile_mirrordummy`.
The current `get_migration_by_app` has a rather naive approach to
compiling the migration status of a realm, which has led to issues like
#32826. Specifically, those flaws are:
- it does not report the complete state of the migration status of the
exporting servers, only the applied migration.
- it shows both the replaced and the squashed migrations. This would be
a problem if we decide to clean up old migration files we've
squashed(replaced) and import a slightly older realm with those still in
disk. `check_migration_status` would complain of incompatibility even
though those migration files don't matter (they are replaced, after
all).
- it does not clean up ancient/stale applied migrations (for reference,
see how `check-database-compatibility` cleans those)
This commit attempts to write a better `migration_status.json` by
parsing the output of `showmigrations` instead.
This is because Django's `showmigrations` has a lot more logic and
validations baked into it than previously thought. Ones that we care
about are:
- it does validations to make sure app names are valid
- it doesn't list replaced migrations and only squashed one
- it takes into account migrations in disk(`MigrationsLoader`) vs
applied migrations (`MigrationsRecorder`)
Which would resolve the first two points highlighted above.
This commit adds `parse_migration_status`, which takes in the string
output of `showmigrations` and parse it into key-value pair of installed
apps and a list of its migration status.
This is a prep commit to rework the check migrations function of
import/export which will parse the output of `showmigrations` to write
the `migration_status.json` file.
Currently if for what ever reason one decided to change how
`migration_status.json` is written, the check migrations tests in
`test_import_export.py` will happily just use the old and potentially
stale migration status test fixtures in
`fixtures/applied_migrations_fixtures` against other stale fixtures and
run the check migrations tests in a bubble.
This adds an assertion in `verify_migration_status_json` that makes sure
all the migration status fixtures we use in the tests resembles the
actual `migration_status.json` file our export tool will write.
If the content-type of the image is not in INLINE_MIME_TYPES, then we
do not expect browsers to be able to display it. This behaviour is
particularly confusing because the thumbnail will render properly,
since that will be in the more widely-supported WebP format, but the
lightbox will show a broken image.
In these cases, generate a high-resolution (4032x3024) "thumbnail"
which clients can choose to use instead. This thumbnail format is not
in the listed in the server's advertised thumbnail size list, because
it is not reliably generated for every image.
The transcoded thumbnail format is set on the `img` tag if it is
generated, and the original content-type is always passed to the
client, so it can decide how or if to render the original image. This
content-type is as the _original uploader_ specified it, so may be
incorrect.
The transcoded image is not animated, even if the original was. HEIC
files can nominally be animated, but in testing libvips was not able
to correctly recognize them as such. TIFF files are parsed as being
"animated," with one page per frame; this is of dubious utility, so
we merely transcode the first page. Always generating a static
transcoded image serves to also limit the computational time spent.
THUMBNAIL_OUTPUT_FORMATS is switched to be a tuple to ensure that it
is not accidentally mutated.
This commit sorts the list of migrations to ensure the same
sets of migrations don't somehow ordered differently. This
has been reported to happen when importing Zulip Cloud to
self-host (zulip-cloud-current).
The order of migrations listed in `migration_status.json`
don't actually matter, migrations specify which other
migrations they depend on.
Fixes#32826.
This change improves error handling in `do_import_realm` by replacing
the use of a generic Exception with CommandError. The updated approach
provides clearer, user-friendly error messages when there is a version
mismatch between the exported data and the Zulip server.
Fixes#32292.
Otherwise, these tests fail if ZULIP_VERSION is different locally from
what's hard-coded. Use a placeholder instead and replace dynamically in
a helper function.
When transferring a realm to a server that has a different set of
applied migrations (different Zulip versions), there is a chance that
the imported data formats appear to be compatible but data invariants
could still be violated.
This commit adds an assertion during the import process to verify
that both the exported realm and the importing server have matching
Zulip versions and have a compatible set of migrations.
This commit updates the export process to write the migration status of
the realm as a JSON file to be included in the export tarball.
This is a preparatory step for adding an assertion to ensure that the
importing and exporting realms have a compatible set of applied
migrations.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.
There are a few fundamental issues that the implementation needs to
solve.
1. The images come from an untrusted source and therefore we don't want
to just pass them through to thumbnailing without checking. For that
reason, we cannot just import ImageAttachment rows from the export
data, even for zulip=>zulip imports.
The right way to process images is to pass them to maybe_thumbail(),
which runs libvips_check_image() on them to verify we're okay with
thumbnailing, creates ImageAttachment rows for them and sends them
to the thumbnailing queue worker. This approach lets us handle both
zulip=>zulip and 3rd party=>zulip imports in the same way,
2. There is a somewhat circular dependency between the Message,
Attachment and ImageAttachment import process:
- ImageAttachments would ideally be created after importing
Attachments, but they need to already exist at the time of Message
import. Otherwise, the markdown processor doesn't know it has to add
HTML for image previews to messages that reference images. This would
mean that messages imported from 3rd party tools don't get image
previews.
- Attachments only get created after Message import however, due to the
many-to-many relationship between Message and Attachment.
This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.
While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:
1b47134d0d/zerver/views/upload.py (L333-L342)
Earlier, we used to store the key data related to realm exports
in RealmAuditLog. This commit adds a separate table to store
those data.
It includes the code to migrate the concerned existing data in
RealmAuditLog to RealmExport.
Fixes part of #31201.
We create an unnamed user group with just the group creator as it's
member when trying to set the default. The pattern I've followed across
most of the acting_user additions is to just put the user declared
somewhere before the check_add_user_group and see if the test passes.
If it does not, then I'll look at what kind of user it needs to be set
to `acting_user`.
For exporting full with consent:
* Earlier, a message advertising users to react with thumbs up
was sent and later used to determine the users who consented.
* Now, we no longer need to send such a message. This commit
updates the logic to use `allow_private_data_export` user-setting
to determine users who consented.
Fixes part of #31201.
Renamed event types below in the enum class to use channel instead of
stream.
Event types moved: STREAM_CREATED, STREAM_DEACTIVATED, STREAM_NAME_CHANGED
STREAM_REACTIVATED, STREAM_MESSAGE_RETENTION_DAYS_CHANGED
STREAM_PROPERTY_CHANGED, STREAM_GROUP_BASED_SETTING_CHANGED
The export tool was only exporting the already-thumbnailed emoji file,
omitting the original one. Now we make sure to export the .original file
too, like we do for avatars, and make the import tool process it
directly, to thumbnail it directly and generate a still in the case of
animated emojis.
Otherwise, the imported realm wouldn't have the <emoji>.png.original
file that we generally expect to have accessible, and stills for
animated emojis were completely missing.
Instead of the PUSH_NOTIFICATIONS_BOUNCER_URL and
SUBMIT_USAGE_STATISTICS settings, we want servers to configure
individual ZULIP_SERVICE_* settings, while maintaining backward
compatibility with the old settings. Thus, if all the new
ZULIP_SERVICE_* are at their default False value, but the legacy
settings are activated, they need to be translated in computed_settings
to the modern way.
This commit renames the "Huddle" Django model class to
"DirectMessageGroup", while maintaining the same table --
"zerver_huddle".
Fixes part of #28640.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable. Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.
This requires plumbing the current (or future) avatar version through
various parts of the upload process.
Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).
We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.
Fixes: #12852.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.
It also renames variables and methods of type -
"huddle_message" to "group_direct_message".
This is a part of #28640