Commit Graph

54 Commits

Author SHA1 Message Date
Alex Vandiver
25fbb05fea upload: Attempt to guess character set text/* which are served inline.
This is only `text/plain`, currently.  In such cases where the
client-provided content-type also does not specify a `charset`, we use
`chardet` to make a guess, and store that guess to provide later when
serving the content.  The detection is done in a streaming fashion,
and thus should not require re-downloading the full content.
2025-07-24 22:40:12 -07:00
Alex Vandiver
183da665ac upload: Rename .source to .vips_source. 2025-07-24 22:40:12 -07:00
Alex Vandiver
aeed907c50 s3: Support non-AWS S3 providers which do not support request checksums. 2025-04-10 11:11:25 -07:00
Alex Vandiver
9fa5ab951c mime_types: Move INLINE_MIME_TYPES to prevent future import loops. 2025-01-09 09:09:13 -08:00
Mateusz Mandera
420849ff6a slack: Call the correct resize_* function when importing realm icon.
For resizing the icon.png files, we use resize_avatar, not resize_logo.
This is pretty confusing - sure, for icons we use the same function as
for avatars, but we should have a proper name for the function called in
the icon context. So this commit also adds resize_realm_icon, and
changes the calls to resize_avatar in icon contexts to
resize_realm_icon.
2024-11-08 15:43:18 -08:00
Alex Vandiver
a20673a267 upload: Allow filtering to just a prefix (e.g. a realm id). 2024-09-26 12:01:11 -07:00
Alex Vandiver
2dc737335e upload: Switch from BinaryIO to IO[bytes].
This is slightly more generally-compatible.
2024-09-26 12:01:11 -07:00
Alex Vandiver
638c579c56 tusd: Set metadata correctly in S3.
The Content-Type, Content-Disposition, StorageClass, and general
metadata are not set according to our patterns by tusd; copy the file
to itself to update those properties.
2024-09-26 12:00:43 -07:00
Alex Vandiver
84280ed7c2 upload: When serving s3 download URLs, send real filename.
Setting `ResponseContentDisposition=attachment` means that we override
the stored `ContentDisposition`, which includes a filename.  This
means that using the "Download" link on servers with S3 storage
produced a file named the sanitized version we stored.

Explicitly build a `ContentDisposition` to tell S3 to return, which
includes both `attachment` as well as the filename (if we have it
locally).
2024-09-26 12:00:43 -07:00
Anders Kaseorg
184c0203f3 upload: Lazily import boto3.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-24 16:38:37 -07:00
Alex Vandiver
e125ad823d exports: Add a separate bucket for realm exports.
This allows finer-grained access control and auditing.  The links
generated also expire after one week, and the suggested configuration
is that the underlying data does as well.

Co-authored-by: Prakhar Pratyush <prakhar@zulip.com>
2024-09-20 15:43:49 -07:00
Alex Vandiver
c1e8ecd08f uploads: Cache boto client in the module and be writable.
The `get_signed_upload_url` code is called for every S3 file serve
request, and is thus in the hot path.  The boto3 client caching
optimization is thus potentially useful as a performance optimization.
2024-09-20 15:43:49 -07:00
Alex Vandiver
1a7b3ef7ed upload: Use get_export_tarball_url in upload_export_tarball. 2024-09-20 15:43:49 -07:00
Alex Vandiver
4cf835d9dd upload: Remove common cache from get_export_tarball_url.
This is not called in the hot path like get_avatar_url is.
2024-09-20 15:43:49 -07:00
Alex Vandiver
a5bf452202 upload: Realm is not Optional in upload_export_tarball.
af4eb8c0d5 marked the base class and local backend as non-Optional,
but left the S3 backend as Optional for some reason.

Remove it.
2024-09-20 15:43:49 -07:00
Alex Vandiver
9a1f78db22 thumbnail: Support checking for images from streaming sources.
We may not always have trivial access to all of the bytes of the
uploaded file -- for instance, if the file was uploaded previously, or
by some other process.  Downloading the entire image in order to check
its headers is an inefficient use of time and bandwidth.

Adjust `maybe_thumbnail` and dependencies to potentially take a
`pyvips.Source` which supports streaming data from S3 or disk.  This
allows making the ImageAttachment row, if deemed appropriate, based on
only a few KB of data, and not the entire image.
2024-09-17 12:51:30 -07:00
Alex Vandiver
b4764f49df upload: Download files with their original names.
Fixes: #29491.
2024-09-09 12:40:17 -07:00
Alex Vandiver
ca72e756eb upload: Rename "upload_image_to_s3"; it is not only for images. 2024-09-09 12:40:17 -07:00
Anders Kaseorg
91ade25ba3 python: Simplify with str.removeprefix, str.removesuffix.
These are available in Python ≥ 3.9.
https://docs.python.org/3/library/stdtypes.html#str.removeprefix

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-03 12:30:16 -07:00
Alex Vandiver
2e38f426f4 upload: Generate thumbnails when images are uploaded.
A new table is created to track which path_id attachments are images,
and for those their metadata, and which thumbnails have been created.
Using path_id as the effective primary key lets us ignore if the
attachment is archived or not, saving some foreign key messes.

A new worker is added to observe events when rows are added to this
table, and to generate and store thumbnails for those images in
differing sizes and formats.
2024-07-16 13:22:15 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver
0385e5bab9 emoji: Store in S3 with a long public cache-control. 2024-07-12 13:26:47 -07:00
Alex Vandiver
ff90e5355f upload: Pass down content-type of realm icon/logo to backend.
This saves having to try to re-derive it from the file extension,
which may be ".original" in some cases.
2024-07-11 07:31:39 -07:00
Alex Vandiver
58a9fe9af1 upload: Drop unused parameters to upload_message_attachment. 2024-07-07 14:40:07 -07:00
Alex Vandiver
e29a455b2d avatars: Encode version into the filename.
Hash the salt, user-id, and now avatar version into the filename.
This allows the URL contents to be immutable, and thus to be marked as
immutable and cacheable.  Since avatars are served unauthenticated,
hashing with a server-side salt makes the current and past avatars not
enumerable.

This requires plumbing the current (or future) avatar version through
various parts of the upload process.

Since this already requires a full migration of current avatars, also
take the opportunity to fix the missing `.png` on S3 uploads (#12852).

We switch from SHA-1 to SHA-256, but truncate it such that avatar URL
data does not substantially increase in size.

Fixes: #12852.
2024-07-07 14:40:07 -07:00
Alex Vandiver
feca9939bb s3: Support setting a cache-control on uploads. 2024-07-07 14:40:07 -07:00
Alex Vandiver
6258817bfd s3: Stop setting empty Content-Disposition header. 2024-07-07 14:40:07 -07:00
Alex Vandiver
2eaf098c5d upload: Content-type is always defined. 2024-06-26 16:43:11 -07:00
Alex Vandiver
c826d80061 upload: Factor out common code into zerver.lib.upload. 2024-06-26 16:43:11 -07:00
Alex Vandiver
5cd10ce51d s3: Allow setting a CloudFront URL prefix for avatar and emoji images. 2024-06-26 16:43:11 -07:00
Alex Vandiver
9fb03cb2c7 upload: Factor out common avatar logic. 2024-06-26 16:38:01 -07:00
Alex Vandiver
d92993c972 upload: Factor out common emoji logic. 2024-06-26 16:38:01 -07:00
Alex Vandiver
0153d6dbcd thumbnailing: Move resizing functions into zerver.lib.thumbnail. 2024-06-20 23:06:08 -04:00
Anders Kaseorg
fb4ad1422e mime_types: Add audio and image types missing from Python library.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-06-20 15:29:20 -07:00
Anders Kaseorg
93198a19ed requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-29 10:41:54 -08:00
Alex Vandiver
75d6f35069 s3: Add a setting for S3 addressing style.
This controls if boto3 attempts to use
`https://bucketname.endpointname/` or `https://endpointname/bucket/`
as its prefix.  See
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

Fixes: #28424.
2024-01-05 11:12:18 -08:00
Alex Vandiver
3aea67a8ed s3: Only use get_bucket to get to boto3 clients and resources.
boto3 has two different modalities of making API calls -- through
resources, and through clients.  Resources are a higher-level
abstraction, and thus more generally useful, but some APIs are only
accessible through clients.  It is possible to get to a client object
from a resource, but not vice versa.

Use `get_bucket(...).meta.client` when we need direct access to the
client object for more complex API calls; this lets all of the
configuration for how to access S3 to sit within `get_bucket`.  Client
objects are not bound to only one bucket, but we get to them based on
the bucket we will be interacting with, for clarity.

We removed the cached session object, as it serves no real purpose.
2024-01-05 11:12:18 -08:00
Alex Vandiver
214bd4ed88 s3: Stop caching get_boto_client, which is only ever called once.
e883ab057f started caching the boto client, which we had identified
as slow call.  e883ab057f went further, calling
`get_boto_client().generate_presigned_url()` once and caching that
result.

This makes the inner cache on the client useless.  Remove it.
2024-01-05 11:12:18 -08:00
Anders Kaseorg
3853fa875a python: Consistently use from…import for urllib.parse.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-05 13:03:07 -08:00
Anders Kaseorg
a50eb2e809 mypy: Enable new error explicit-override.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-12 12:28:41 -07:00
Anders Kaseorg
55aa29bef4 ruff: Fix FLY002 Consider f"…" instead of string join.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-07 17:12:41 -07:00
Alex Vandiver
d957559371 uploads: Allow uploads to set storage class.
Uploads are well-positioned to use S3's "intelligent tiering" storage
class.  Add a setting to let uploaded files to declare their desired
storage class at upload time, and document how to move existing files
to the same storage class.
2023-07-19 16:19:34 -07:00
Alex Vandiver
e408f069fe uploads: Add a method to copy attachment contents out. 2023-04-07 09:13:48 -07:00
Alex Vandiver
3bf3f47b49 delete_old_unclaimed_attachments: Add flag to clean up storage.
Actions like deleting realms may leave unreferenced uploads in the
attachment storage backend.

Fix these by walking the complete contents of the attachment storage
backend, and removing files which are no longer present in the
database.  This may take quite some time, as it is necessarily O(n) in
the number of files uploaded to the system.
2023-03-02 16:36:19 -08:00
Alex Vandiver
c9d1755a12 delete_realm: Optimize attachment cleanup by batching. 2023-03-02 16:36:19 -08:00
Alex Vandiver
b31a6dc56c upload: Reorder functions into logical groupings. 2023-03-02 16:36:19 -08:00
Alex Vandiver
bd80c048be upload: Rename delete_message_image to use word "attachment".
The table is named Attachment, and not all of them are images.
2023-03-02 16:36:19 -08:00
Alex Vandiver
567d1d54e7 upload: Rename upload_message_file to use word "attachment".
For consistency with the table, which is named Attachment.
2023-03-02 16:36:19 -08:00