Commit Graph

63 Commits

Author SHA1 Message Date
Anders Kaseorg
88a8087243 ruff: Fix PYI019 Use Self instead of custom TypeVar.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-03-14 17:31:50 -07:00
Alex Vandiver
3325151f05 import_realm: Do not enqueue thumbnails on ImageAttachment creation.
This removes a race which parallels 6f20c15ae9, but in the import
path; thumbnails could have been generated while the message content
was being rendered, leaving them permanently with spinners.

The act of rendering the message markdown is what enqueues the
thumbnailing of referenced images; previously, images were enqueued at
least _twice_: once when the row was created, and again for every
message they were referenced in.
2025-01-31 14:29:57 -08:00
Alex Vandiver
98362de185 models: Add content_type to ImageAttachment.
This means that only ImageAttachment row needs to be fetched, and
removes the need to pass around an extra parameter.  This
denormalization is safe, since in general Attachment rows are
read-only, so we are not concerned with drift between the Attachment
and ImageAttachment tables.

We cannot make content_type non-null, since while the both the
`content_type` column in Attachment and populating that from requests
predates the ImageAttachment table, we have both backfilled
ImageAttachment rows to consider, and imports may also leave files
with no `content_type`.  Any backfill of currently-null `content_type`
values will thus need to update both tables.

This change fixes a race condition when importing. ImageAttachment
rows are imported before rendering Messages, which are both before
importing Attachment rows; if the thumbnailing finished after the
Message was imported but before Attachment rows were imported, then
the re-rendering step would not know the image's content-type.
2025-01-31 14:29:57 -08:00
Alex Vandiver
8bd8a33dd2 thumbnail: Show the first few frames of large animated images.
71406ac767 switched the IMAGE_BOMB_TOTAL_PIXELS cutoff for what
images we preview to include the number of frames in the calculation.
While accurate to the implementation (thumbnailing a 1k-frame animation is
prohibitive, even a small resolutions), this was a behaviour change
from without thumbnailing -- animated gifs did not display inline at
all anymore.

Switch to thumbnailing as many frames as we can fit into a pixel-based
animated thumbnailing threshold, with a minimum of three (to be able
to convey that the image is actually animated).  Smaller-resolution
images will hence get more frames in their preview.  This also allows
the standard animate-on-hover or always-animate behaviour to be true
to their configurations, without confusing edge cases.

Fixes: #32609.
2025-01-15 09:56:19 -08:00
Alex Vandiver
230bae17bb thumbnail: Generate a transcoded high-res version of HEIC/TIFF images.
If the content-type of the image is not in INLINE_MIME_TYPES, then we
do not expect browsers to be able to display it.  This behaviour is
particularly confusing because the thumbnail will render properly,
since that will be in the more widely-supported WebP format, but the
lightbox will show a broken image.

In these cases, generate a high-resolution (4032x3024) "thumbnail"
which clients can choose to use instead.  This thumbnail format is not
in the listed in the server's advertised thumbnail size list, because
it is not reliably generated for every image.

The transcoded thumbnail format is set on the `img` tag if it is
generated, and the original content-type is always passed to the
client, so it can decide how or if to render the original image.  This
content-type is as the _original uploader_ specified it, so may be
incorrect.

The transcoded image is not animated, even if the original was.  HEIC
files can nominally be animated, but in testing libvips was not able
to correctly recognize them as such.  TIFF files are parsed as being
"animated," with one page per frame; this is of dubious utility, so
we merely transcode the first page.  Always generating a static
transcoded image serves to also limit the computational time spent.

THUMBNAIL_OUTPUT_FORMATS is switched to be a tuple to ensure that it
is not accidentally mutated.
2025-01-09 09:10:28 -08:00
Alex Vandiver
9fa5ab951c mime_types: Move INLINE_MIME_TYPES to prevent future import loops. 2025-01-09 09:09:13 -08:00
Mateusz Mandera
420849ff6a slack: Call the correct resize_* function when importing realm icon.
For resizing the icon.png files, we use resize_avatar, not resize_logo.
This is pretty confusing - sure, for icons we use the same function as
for avatars, but we should have a proper name for the function called in
the icon context. So this commit also adds resize_realm_icon, and
changes the calls to resize_avatar in icon contexts to
resize_realm_icon.
2024-11-08 15:43:18 -08:00
Mateusz Mandera
da4443f392 thumbnail: Make thumbnailing work with data import.
We didn't have thumbnailing for images coming from data import and this
commit adds the functionality.

There are a few fundamental issues that the implementation needs to
solve.

1. The images come from an untrusted source and therefore we don't want
   to just pass them through to thumbnailing without checking. For that
   reason, we cannot just import ImageAttachment rows from the export
   data, even for zulip=>zulip imports.
   The right way to process images is to pass them to maybe_thumbail(),
   which runs libvips_check_image() on them to verify we're okay with
   thumbnailing, creates ImageAttachment rows for them and sends them
   to the thumbnailing queue worker. This approach lets us handle both
   zulip=>zulip and 3rd party=>zulip imports in the same way,

2. There is a somewhat circular dependency between the Message,
   Attachment and ImageAttachment import process:

- ImageAttachments would ideally be created after importing
  Attachments, but they need to already exist at the time of Message
  import. Otherwise, the markdown processor doesn't know it has to add
  HTML for image previews to messages that reference images. This would
  mean that messages imported from 3rd party tools don't get image
  previews.
- Attachments only get created after Message import however, due to the
  many-to-many relationship between Message and Attachment.

This is solved by fixing up some data of Attachments pre-emptively, such
as the path_ids. This gives us the necessary information for creating
ImageAttachments before importing Messages.

While we generate ImageAttachment rows synchronously, the actual
thumbnailing job is sent to the queue worker. Theoretically, the worker
could be very backlogged and not process the thumbnails anytime soon.
This is fine - if the app is loaded and tries to display a message with
such a not-yet-generated thumbnail, the code in `serve_file` will
generate the thumbnails synchronously on the fly and the user will see
the image preview displayed normally. See:

1b47134d0d/zerver/views/upload.py (L333-L342)
2024-10-24 10:32:51 -07:00
Alex Vandiver
9a1f78db22 thumbnail: Support checking for images from streaming sources.
We may not always have trivial access to all of the bytes of the
uploaded file -- for instance, if the file was uploaded previously, or
by some other process.  Downloading the entire image in order to check
its headers is an inefficient use of time and bandwidth.

Adjust `maybe_thumbnail` and dependencies to potentially take a
`pyvips.Source` which supports streaming data from S3 or disk.  This
allows making the ImageAttachment row, if deemed appropriate, based on
only a few KB of data, and not the entire image.
2024-09-17 12:51:30 -07:00
Alex Vandiver
ef21dd9b99 thumbnail: Set a stable ordering on ImageAttachment rows for locking.
Failure to have a stable ordering can lead to deadlocks.
2024-09-17 09:14:52 -07:00
Alex Vandiver
8bacdbc895 thumbnail: Put the original dimensions on spinner images.
This lets us reserve the right amount of space in the message feed
immediately.
2024-09-09 15:59:02 -07:00
Anders Kaseorg
e3abd09e67 thumbnail: Fix corrupted email notifications due to HTML5 entities.
BeautifulSoup with formatter="html5" unnecessarily escapes many
characters with HTML5-specific entities that cannot be correctly
parsed by lxml during generation of email notifications.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-05 16:00:45 -07:00
Anders Kaseorg
91ade25ba3 python: Simplify with str.removeprefix, str.removesuffix.
These are available in Python ≥ 3.9.
https://docs.python.org/3/library/stdtypes.html#str.removeprefix

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-03 12:30:16 -07:00
Alex Vandiver
0c07c6531c thumbnail: Enqueue thumbnails when we render a spinner.
Thumbnails are usually enqueued in the worker when the image is
uploaded.  However, for images which were uploaded before the
existence of the thumbnailing worker, and whose metadata was
backfilled (see previous commit) this leaves a permanent spinner,
since nothing triggers the thumbnail worker for them.

Enqueue a thumbnail worker for every spinner which we render into
Markdown.  This ensures that _something_ is attempting to resolve the
spinner which the user sees.  In the case of freshly-uploaded images
which are still in the queue, this results in a duplicate entry in the
thumbnailing queue -- this is harmless, since the worker determines
that all of the thumbnails we need have already been generated, and it
does no further work.  However, in the case of historical uploads, it
properly kicks off the thumbnailing process and results in a
subsequent message update to include the freshly-generated thumbnail.

While specifically useful for backfilled uploads, this is also
generally a good safety step for a good user experience, as it also
prevents dropped events in the queue from unknown causes from leaving
perpetual spinners in the message feed.

Because `get_user_upload_previews` is potentially called twice for
every message with spinners (see 6f20c15ae9), we add an additional
flag to `get_user_upload_previews` to suppress a _second_ event from
being enqueued for every spinner generated.
2024-08-29 12:11:51 -07:00
Alex Vandiver
6f20c15ae9 thumbnail: Resolve a race condition when rendering messages.
Messages are rendered outside of a transaction, for performance
reasons, and then sent inside of one.  This opens thumbnailing up to a
race where the thumbnails have not yet been written when the message
is rendered, but the message has not been sent when thumbnailing
completes, causing `rewrite_thumbnailed_images` to be a no-op and the
message being left with a spinner which never resolves.

Explicitly lock and use he ImageAttachment data inside the
message-sending transaction, to rewrite the message content with the
latest information about the existing thumbnails.

Despite the thumbnailing worker taking a lock on Message rows to
update them, this does not lead to deadlocks -- the INSERT of the
Message rows happens in a transaction, ensuring that either the
message rending blocks the thumbnailing until the Message row is
created, or that the `rewrite_thumbnailed_images` and Message INSERT
waits until thumbnailing is complete (and updated no Message rows).
2024-08-01 16:48:16 -07:00
Mateusz Mandera
a0971934d9 thumbnail: Fix typo in comment. 2024-07-30 00:17:59 +02:00
Alex Vandiver
c726d2ec01 thumbnail: Do not Camo old thumbor URLs; serve images directly.
Providing a signed Camo URL for arbitrary URLs opened the server up to
being an open redirector.  Return 403 if the URL is not a user upload,
and the backend image if it is.  Since we do not have ImageAttachment
rows for uploads at a time we wrote `/thumbnail?` URLs, return the
full-size content.
2024-07-24 16:04:34 -07:00
Alex Vandiver
e3a238fc89 thumbnail: Remove unused thumbnail sizes.
47683144ff switched the web client to prefer the 840x560 size, as the
mobile apps prefer; remove the now-unused 300x200 size.  No client was
using the generated `.jpg` formats, as all clients support `.webp`, so
remove the unused `.jpg` thumbnail as well.
2024-07-24 09:57:20 -07:00
Alex Vandiver
e4a8304f57 thumbnail: Store the post-orientation-transformation dimensions.
Modern browsers respect the EXIF orientation information of images,
applying rotation and/or mirroring as specified in those tags.  The
the `width="..."` and `height="..."` tags are to size the image
_after_ applying those orientation transformations.

The `.width` and `.height` properties of libvips' images are _before_
any transformations are applied.  Since we intend to use these to hint
to rendering clients the size that the image should be _rendered at_,
change to storing (and providing to clients) the dimensions of the
rendered image, not the stored bytes.
2024-07-24 09:56:42 -07:00
Alex Vandiver
2ea0cc0005 thumbnail: Add a data-original-dimensions attribute.
This allows clients to potentially lay out the thumbnails more
intelligently, or to provide a better "progressive-load" experience
when enlarging the thumbnail.
2024-07-22 22:41:10 -04:00
Alex Vandiver
65828b20e9 thumbnail: Factor out a dataclass for markdown image metadata. 2024-07-22 22:41:10 -04:00
Alex Vandiver
3ac14632d8 thumbnail: Disable libvips cache.
The libvips cache is 100MB, 100 operations, or 100 files, whichever is
less.  A single Django process or worker is extremely unlikely to ever
see the same image twice, much less within those timeframes.

Disable the cache, since it is mostly useless memory usage for our use
case.
2024-07-22 10:19:33 -07:00
Alex Vandiver
b42863be4b markdown: Show thumbnails for uploaded images.
Fixes: #16210.
2024-07-21 18:41:59 -07:00
Alex Vandiver
71406ac767 thumbnail: Factor frames into account for IMAGE_BOMB_TOTAL_PIXELS. 2024-07-21 18:41:59 -07:00
Alex Vandiver
4351cc5914 thumbnail: Move get_image_thumbnail_path and split_thumbnail_path. 2024-07-18 13:50:28 -07:00
Alex Vandiver
2e38f426f4 upload: Generate thumbnails when images are uploaded.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.

A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
2024-07-16 13:22:15 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver
544d3df057 thumbnail: Stop applying MAX_EMOJI_GIF_FILE_SIZE_BYTES before resizing.
b14a33c659 attempted to make the 128k limit apply _after_ resizing,
but left this check, which examines the pre-resized image size.
2024-07-12 13:26:47 -07:00
Alex Vandiver
f6b99171ce emoji: Derive the file extension from a limited set of content-types.
We thumbnail and serve emoji with the same format as they were
uploaded.  However, we preserved the original extension, which might
mismatch with the provided content-type.

Limit the content-type to a subset which is both (a) an image format
we can thumbnail, and (b) a media format which is widely-enough
supported that we are willing to provide it to all browsers.  This
prevents uploading a `.tiff` emoji, for instance.

Based on this limited content-type, we then reverse to find the
reasonable extension to use when storing it.  This is particularly
important because the local file storage uses the file extension to
choose what content-type to re-serve the emoji as.

This does nothing for existing emoji, which may have odd or missing
file extensions.
2024-07-12 13:26:47 -07:00
Alex Vandiver
382cb5bb13 thumbnail: Lock down which formats we parse. 2024-07-11 07:31:39 -07:00
Alex Vandiver
4bc563128e thumbnail: Use a consistent set of supported image types. 2024-07-11 07:31:39 -07:00
Alex Vandiver
a091b9ef81 thumbnail: Provide a more explicit hint than "Bad". 2024-07-11 07:31:39 -07:00
Alex Vandiver
fb929ca218 thumbnailing: Remove unnecessary third return value from resize_emoji. 2024-06-26 16:43:09 -07:00
Alex Vandiver
b14a33c659 thumbnailing: Switch to libvips, from PIL/pillow.
This is done in as much of a drop-in fashion as possible.  Note that
libvips does not support animated PNGs[^1], and as such this
conversion removes support for them as emoji; however, libvips
includes support for webp images, which future commits will take
advantage of.

This removes the MAX_EMOJI_GIF_SIZE limit, since that existed to work
around bugs in Pillow.  MAX_EMOJI_GIF_FILE_SIZE_BYTES is fixed to
actually be 128KiB (not 128MiB, as it actually was), and is counted
_after_ resizing, since the point is to limit the amount of data
transfer to clients.

[^1]: https://github.com/libvips/libvips/discussions/2000
2024-06-26 16:42:57 -07:00
Alex Vandiver
0153d6dbcd thumbnailing: Move resizing functions into zerver.lib.thumbnail. 2024-06-20 23:06:08 -04:00
Alex Vandiver
2c5dff7f59 thumbnailing: Remove unnecessary os.path adjustment.
This is a library file, not a binary; os.path is already set up.
2024-06-20 23:06:08 -04:00
Mateusz Mandera
2299aa3382 docs: Remove some outdated references to thumbnailing.md doc.
The doc was removed in 405bc8dabf
2022-07-12 17:44:24 -07:00
Anders Kaseorg
0b795e492f thumbnail: Remove unused is_camo_url parameter.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-19 01:51:37 -07:00
Anders Kaseorg
405bc8dabf requirements: Remove Thumbor.
Thumbor and tc-aws have been dragging their feet on Python 3 support
for years, and even the alphas and unofficial forks we’ve been running
don’t seem to be maintained anymore.  Depending on these projects is
no longer viable for us.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-05-06 20:07:32 -07:00
Anders Kaseorg
dcdb00a5e6 python: Convert deprecated Django is_safe_url.
django.utils.http.is_safe_url is a deprecated alias of
django.utils.http.url_has_allowed_host_and_scheme as of Django 3.0,
and will be removed in Django 4.0.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-04-15 18:01:34 -07:00
Anders Kaseorg
6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
72d6ff3c3b docs: Fix more capitalization issues.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-23 11:46:55 -07:00
Anders Kaseorg
365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg
69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Anders Kaseorg
67e7a3631d python: Convert percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-10 15:02:09 -07:00
arpit551
d60efa1478 thumbor: Fix __file__ typo.
Replaced '__file__' typo with __file__ which used to add
wrong path to sys.path.
2020-04-12 11:23:03 -07:00
Anders Kaseorg
c734bbd95d python: Modernize legacy Python 2 syntax with pyupgrade.
Generated by `pyupgrade --py3-plus --keep-percent-format` on all our
Python code except `zthumbor` and `zulip-ec2-configure-interfaces`,
followed by manual indentation fixes.

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-04-09 16:43:22 -07:00