This means that only ImageAttachment row needs to be fetched, and
removes the need to pass around an extra parameter. This
denormalization is safe, since in general Attachment rows are
read-only, so we are not concerned with drift between the Attachment
and ImageAttachment tables.
We cannot make content_type non-null, since while the both the
`content_type` column in Attachment and populating that from requests
predates the ImageAttachment table, we have both backfilled
ImageAttachment rows to consider, and imports may also leave files
with no `content_type`. Any backfill of currently-null `content_type`
values will thus need to update both tables.
This change fixes a race condition when importing. ImageAttachment
rows are imported before rendering Messages, which are both before
importing Attachment rows; if the thumbnailing finished after the
Message was imported but before Attachment rows were imported, then
the re-rendering step would not know the image's content-type.
If the content-type of the image is not in INLINE_MIME_TYPES, then we
do not expect browsers to be able to display it. This behaviour is
particularly confusing because the thumbnail will render properly,
since that will be in the more widely-supported WebP format, but the
lightbox will show a broken image.
In these cases, generate a high-resolution (4032x3024) "thumbnail"
which clients can choose to use instead. This thumbnail format is not
in the listed in the server's advertised thumbnail size list, because
it is not reliably generated for every image.
The transcoded thumbnail format is set on the `img` tag if it is
generated, and the original content-type is always passed to the
client, so it can decide how or if to render the original image. This
content-type is as the _original uploader_ specified it, so may be
incorrect.
The transcoded image is not animated, even if the original was. HEIC
files can nominally be animated, but in testing libvips was not able
to correctly recognize them as such. TIFF files are parsed as being
"animated," with one page per frame; this is of dubious utility, so
we merely transcode the first page. Always generating a static
transcoded image serves to also limit the computational time spent.
THUMBNAIL_OUTPUT_FORMATS is switched to be a tuple to ensure that it
is not accidentally mutated.
BeautifulSoup with formatter="html5" unnecessarily escapes many
characters with HTML5-specific entities that cannot be correctly
parsed by lxml during generation of email notifications.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Thumbnails are usually enqueued in the worker when the image is
uploaded. However, for images which were uploaded before the
existence of the thumbnailing worker, and whose metadata was
backfilled (see previous commit) this leaves a permanent spinner,
since nothing triggers the thumbnail worker for them.
Enqueue a thumbnail worker for every spinner which we render into
Markdown. This ensures that _something_ is attempting to resolve the
spinner which the user sees. In the case of freshly-uploaded images
which are still in the queue, this results in a duplicate entry in the
thumbnailing queue -- this is harmless, since the worker determines
that all of the thumbnails we need have already been generated, and it
does no further work. However, in the case of historical uploads, it
properly kicks off the thumbnailing process and results in a
subsequent message update to include the freshly-generated thumbnail.
While specifically useful for backfilled uploads, this is also
generally a good safety step for a good user experience, as it also
prevents dropped events in the queue from unknown causes from leaving
perpetual spinners in the message feed.
Because `get_user_upload_previews` is potentially called twice for
every message with spinners (see 6f20c15ae9), we add an additional
flag to `get_user_upload_previews` to suppress a _second_ event from
being enqueued for every spinner generated.
Messages are rendered outside of a transaction, for performance
reasons, and then sent inside of one. This opens thumbnailing up to a
race where the thumbnails have not yet been written when the message
is rendered, but the message has not been sent when thumbnailing
completes, causing `rewrite_thumbnailed_images` to be a no-op and the
message being left with a spinner which never resolves.
Explicitly lock and use he ImageAttachment data inside the
message-sending transaction, to rewrite the message content with the
latest information about the existing thumbnails.
Despite the thumbnailing worker taking a lock on Message rows to
update them, this does not lead to deadlocks -- the INSERT of the
Message rows happens in a transaction, ensuring that either the
message rending blocks the thumbnailing until the Message row is
created, or that the `rewrite_thumbnailed_images` and Message INSERT
waits until thumbnailing is complete (and updated no Message rows).
This allows clients to potentially lay out the thumbnails more
intelligently, or to provide a better "progressive-load" experience
when enlarging the thumbnail.