UTF-8 is far and wide the dominant encoding, so we assume that
encoding if we have more than 66% confidence -- this allows it to be
guessed even for short files. ISO-8859-1 caps at 73% confidence, so
we assume that if it's the best option and it hits that max.
Otherwise, we still require 90% confidence.
This is only `text/plain`, currently. In such cases where the
client-provided content-type also does not specify a `charset`, we use
`chardet` to make a guess, and store that guess to provide later when
serving the content. The detection is done in a streaming fashion,
and thus should not require re-downloading the full content.
Previously, the `user_file.content_type` only contained the MIME type
of the uploaded file; no other parameters were included, meaning that
a file a client specified as `content-type: text/plain; charset=big5`
would be stored with an `Attachment.content_type` of `text/plain`.
Re-construct the full content-type header from `content_type_extra`,
which includes those parameters.
We do not include a test because Django does not support specifying
such parameters in the upload path.
This commit updates populate_db to mark all onboarding steps as seen
for existing users to avoid unnecessary popups during development.
Developers should create a new user in development environment
to test the onboarding steps.
The functionality of gravatar can break anonymity if
the user has had a gravatar account set up previously.
This option allows specifically cloud instances to
have gravatar disabled selectively.
This commit adds hardening such that if the invariant "no usermessage
row corresponding to a message exists if the user loses access to the
message" is violated due to some bug, user can't access the attachment
if they are not subscribed.
Separate `avatars/<email_or_id>/medium?` endpoints into distinct
endpoints for email-based and user ID-based access. This change aligns
avatar endpoints with Zulip’s existing API path conventions (e.g., the
`users/` endpoint).
This commit introduces an assertion to verify that the avatar file for
system bots exists and findable.
In development, it'll assert that the avatar file exists in the static
directory. This isn't done in production environment to avoid
unnecessary overhead. It helps verify that the protocol to fetch system
bot avatars still works when making changes during development.
In production it'll check if the avatar file exists in the STATIC_ROOT
and return a default avatar png if it doesn't.
Previously, requesting system bots URLs did not return any -medium.png
variants and SVG file was also used for notification bots' avatar, which
was problematic.
In this commit, the -medium.png variants is added for the avatars of
system bots and zulip-icon-square.svg is also converted into
notification-bot.png for the notification bot. The get_avatar_url method
has been updated to return the "medium" file variants for the system
bots.
Additionally, the system bots' avatar files is moved to a dedicated
directory to simplify the hashing logic for these files. Now, all files
in the "images/static_avatars/" directory will be hashed.
This makes a Zulip server more isolated than relying on gravatar, and
avoids complex logistics if in the future we move system bots to live
inside individual realms.
Co-authored-by: PieterCK <pieterceka123@gmail.com>
Now that we store the content-type in the database, use that value
(if we have it, since we did not backfill) when serving content back
to the client. This means the file backend has parity with the S3
backend.
We plan to remove the 'tutorial_status' field from UserProfile
table as it is no longer used to show tutorial.
The field is also used to narrow a new user in DM with
welcome bot on the first load.
This prep commit updates the logic to use a new OnboardingStep
for the narrowing behaviour on the first load. This will help
in removing the 'tutorial_status' field.
Fixes warnings like “ResourceWarning: unclosed file <_io.FileIO
name='/srv/zulip/var/044e5d44-87aa-4c43-abbb-28a144fa6654/test-backend/run_1238680/worker_0/test_uploads/files/thumbnail/2/1e/jmUuDhQC8WlaSRCuc0zQyx7D/img.tif/100x75.webp'
mode='rb' closefd=True>” with warnings enabled.
deque(…, 0) is an efficient way to consume an iterator documented at
https://docs.python.org/3/library/itertools.html#itertools-recipes
under consume.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We thumbnail and serve emoji with the same format as they were
uploaded. However, we preserved the original extension, which might
mismatch with the provided content-type.
Limit the content-type to a subset which is both (a) an image format
we can thumbnail, and (b) a media format which is widely-enough
supported that we are willing to provide it to all browsers. This
prevents uploading a `.tiff` emoji, for instance.
Based on this limited content-type, we then reverse to find the
reasonable extension to use when storing it. This is particularly
important because the local file storage uses the file extension to
choose what content-type to re-serve the emoji as.
This does nothing for existing emoji, which may have odd or missing
file extensions.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable. Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.
This requires plumbing the current (or future) avatar version through
various parts of the upload process.
Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).
We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.
Fixes: #12852.
This is done in as much of a drop-in fashion as possible. Note that
libvips does not support animated PNGs[^1], and as such this
conversion removes support for them as emoji; however, libvips
includes support for webp images, which future commits will take
advantage of.
This removes the MAX_EMOJI_GIF_SIZE limit, since that existed to work
around bugs in Pillow. MAX_EMOJI_GIF_FILE_SIZE_BYTES is fixed to
actually be 128KiB (not 128MiB, as it actually was), and is counted
_after_ resizing, since the point is to limit the amount of data
transfer to clients.
[^1]: https://github.com/libvips/libvips/discussions/2000
Fixes#29632.
The issue description explains this well:
We currently recalculate `currently_used_upload_space_bytes` every file
upload, by dint of calling `flush_used_upload_space_cache` on
save/delete, and then immediately calling
`user_profile.realm.currently_used_upload_space_bytes()` in
`notify_attachment_update`. Since this walks the Attachments table,
recalculating this can take seconds in large realms.
Switch this to using a CountStat, so we don't need to walk significant
chunks of the Attachment table when we upload an attachment. This will
also give us a historical daily graph of usage.
Most importantly, fixes a bug where a realm with a custom
.upload_quota_gb value (set by changing it in the database via e.g.
manage.py shell) would end up having it lowered while upgrading their
plan via the do_change_realm_plan_type function, which used to just set
it to the value implied by the new plan without caring about whether
that isn't lower than the original limit.
The new approach is cleaner since we don't do db queries by
upload_quota_gb so it's nicer to just generate these dynamically, making
changes to our limit-per-plan rules much easier - skipping the need for
migrations.