zulip

mirror of https://github.com/zulip/zulip.git synced 2025-10-31 20:13:46 +00:00

Author	SHA1	Message	Date
Alex Vandiver	25fbb05fea	upload: Attempt to guess character set text/* which are served inline. This is only `text/plain`, currently. In such cases where the client-provided content-type also does not specify a `charset`, we use `chardet` to make a guess, and store that guess to provide later when serving the content. The detection is done in a streaming fashion, and thus should not require re-downloading the full content.	2025-07-24 22:40:12 -07:00
Alex Vandiver	183da665ac	upload: Rename .source to .vips_source.	2025-07-24 22:40:12 -07:00
Alex Vandiver	aeed907c50	s3: Support non-AWS S3 providers which do not support request checksums.	2025-04-10 11:11:25 -07:00
Alex Vandiver	9fa5ab951c	mime_types: Move INLINE_MIME_TYPES to prevent future import loops.	2025-01-09 09:09:13 -08:00
Mateusz Mandera	420849ff6a	slack: Call the correct resize_* function when importing realm icon. For resizing the icon.png files, we use resize_avatar, not resize_logo. This is pretty confusing - sure, for icons we use the same function as for avatars, but we should have a proper name for the function called in the icon context. So this commit also adds resize_realm_icon, and changes the calls to resize_avatar in icon contexts to resize_realm_icon.	2024-11-08 15:43:18 -08:00
Alex Vandiver	a20673a267	upload: Allow filtering to just a prefix (e.g. a realm id).	2024-09-26 12:01:11 -07:00
Alex Vandiver	2dc737335e	upload: Switch from BinaryIO to IO[bytes]. This is slightly more generally-compatible.	2024-09-26 12:01:11 -07:00
Alex Vandiver	638c579c56	tusd: Set metadata correctly in S3. The Content-Type, Content-Disposition, StorageClass, and general metadata are not set according to our patterns by tusd; copy the file to itself to update those properties.	2024-09-26 12:00:43 -07:00
Alex Vandiver	84280ed7c2	upload: When serving s3 download URLs, send real filename. Setting `ResponseContentDisposition=attachment` means that we override the stored `ContentDisposition`, which includes a filename. This means that using the "Download" link on servers with S3 storage produced a file named the sanitized version we stored. Explicitly build a `ContentDisposition` to tell S3 to return, which includes both `attachment` as well as the filename (if we have it locally).	2024-09-26 12:00:43 -07:00
Anders Kaseorg	184c0203f3	upload: Lazily import boto3. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-09-24 16:38:37 -07:00
Alex Vandiver	e125ad823d	exports: Add a separate bucket for realm exports. This allows finer-grained access control and auditing. The links generated also expire after one week, and the suggested configuration is that the underlying data does as well. Co-authored-by: Prakhar Pratyush <prakhar@zulip.com>	2024-09-20 15:43:49 -07:00
Alex Vandiver	c1e8ecd08f	uploads: Cache boto client in the module and be writable. The `get_signed_upload_url` code is called for every S3 file serve request, and is thus in the hot path. The boto3 client caching optimization is thus potentially useful as a performance optimization.	2024-09-20 15:43:49 -07:00
Alex Vandiver	1a7b3ef7ed	upload: Use get_export_tarball_url in upload_export_tarball.	2024-09-20 15:43:49 -07:00
Alex Vandiver	4cf835d9dd	upload: Remove common cache from get_export_tarball_url. This is not called in the hot path like get_avatar_url is.	2024-09-20 15:43:49 -07:00
Alex Vandiver	a5bf452202	upload: Realm is not Optional in upload_export_tarball. `af4eb8c0d5` marked the base class and local backend as non-Optional, but left the S3 backend as Optional for some reason. Remove it.	2024-09-20 15:43:49 -07:00
Alex Vandiver	9a1f78db22	thumbnail: Support checking for images from streaming sources. We may not always have trivial access to all of the bytes of the uploaded file -- for instance, if the file was uploaded previously, or by some other process. Downloading the entire image in order to check its headers is an inefficient use of time and bandwidth. Adjust `maybe_thumbnail` and dependencies to potentially take a `pyvips.Source` which supports streaming data from S3 or disk. This allows making the ImageAttachment row, if deemed appropriate, based on only a few KB of data, and not the entire image.	2024-09-17 12:51:30 -07:00
Alex Vandiver	b4764f49df	upload: Download files with their original names. Fixes: #29491.	2024-09-09 12:40:17 -07:00
Alex Vandiver	ca72e756eb	upload: Rename "upload_image_to_s3"; it is not only for images.	2024-09-09 12:40:17 -07:00
Anders Kaseorg	91ade25ba3	python: Simplify with str.removeprefix, str.removesuffix. These are available in Python ≥ 3.9. https://docs.python.org/3/library/stdtypes.html#str.removeprefix Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-09-03 12:30:16 -07:00
Alex Vandiver	2e38f426f4	upload: Generate thumbnails when images are uploaded. A new table is created to track which path_id attachments are images, and for those their metadata, and which thumbnails have been created. Using path_id as the effective primary key lets us ignore if the attachment is archived or not, saving some foreign key messes. A new worker is added to observe events when rows are added to this table, and to generate and store thumbnails for those images in differing sizes and formats.	2024-07-16 13:22:15 -07:00
Anders Kaseorg	0fa5e7f629	ruff: Fix UP035 Import from `collections.abc`, `typing` instead. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	531b34cb4c	ruff: Fix UP007 Use `X \| Y` for type annotations. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Anders Kaseorg	e08a24e47f	ruff: Fix UP006 Use `list` instead of `List` for type annotation. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-07-13 22:28:22 -07:00
Alex Vandiver	0385e5bab9	emoji: Store in S3 with a long public cache-control.	2024-07-12 13:26:47 -07:00
Alex Vandiver	ff90e5355f	upload: Pass down content-type of realm icon/logo to backend. This saves having to try to re-derive it from the file extension, which may be ".original" in some cases.	2024-07-11 07:31:39 -07:00
Alex Vandiver	58a9fe9af1	upload: Drop unused parameters to upload_message_attachment.	2024-07-07 14:40:07 -07:00
Alex Vandiver	e29a455b2d	avatars: Encode version into the filename. Hash the salt, user-id, and now avatar version into the filename. This allows the URL contents to be immutable, and thus to be marked as immutable and cacheable. Since avatars are served unauthenticated, hashing with a server-side salt makes the current and past avatars not enumerable. This requires plumbing the current (or future) avatar version through various parts of the upload process. Since this already requires a full migration of current avatars, also take the opportunity to fix the missing `.png` on S3 uploads (#12852). We switch from SHA-1 to SHA-256, but truncate it such that avatar URL data does not substantially increase in size. Fixes: #12852.	2024-07-07 14:40:07 -07:00
Alex Vandiver	feca9939bb	s3: Support setting a cache-control on uploads.	2024-07-07 14:40:07 -07:00
Alex Vandiver	6258817bfd	s3: Stop setting empty Content-Disposition header.	2024-07-07 14:40:07 -07:00
Alex Vandiver	2eaf098c5d	upload: Content-type is always defined.	2024-06-26 16:43:11 -07:00
Alex Vandiver	c826d80061	upload: Factor out common code into zerver.lib.upload.	2024-06-26 16:43:11 -07:00
Alex Vandiver	5cd10ce51d	s3: Allow setting a CloudFront URL prefix for avatar and emoji images.	2024-06-26 16:43:11 -07:00
Alex Vandiver	9fb03cb2c7	upload: Factor out common avatar logic.	2024-06-26 16:38:01 -07:00
Alex Vandiver	d92993c972	upload: Factor out common emoji logic.	2024-06-26 16:38:01 -07:00
Alex Vandiver	0153d6dbcd	thumbnailing: Move resizing functions into zerver.lib.thumbnail.	2024-06-20 23:06:08 -04:00
Anders Kaseorg	fb4ad1422e	mime_types: Add audio and image types missing from Python library. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-06-20 15:29:20 -07:00
Anders Kaseorg	93198a19ed	requirements: Upgrade Python requirements. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2024-01-29 10:41:54 -08:00
Alex Vandiver	75d6f35069	s3: Add a setting for S3 addressing style. This controls if boto3 attempts to use `https://bucketname.endpointname/` or `https://endpointname/bucket/` as its prefix. See https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html Fixes: #28424.	2024-01-05 11:12:18 -08:00
Alex Vandiver	3aea67a8ed	s3: Only use get_bucket to get to boto3 clients and resources. boto3 has two different modalities of making API calls -- through resources, and through clients. Resources are a higher-level abstraction, and thus more generally useful, but some APIs are only accessible through clients. It is possible to get to a client object from a resource, but not vice versa. Use `get_bucket(...).meta.client` when we need direct access to the client object for more complex API calls; this lets all of the configuration for how to access S3 to sit within `get_bucket`. Client objects are not bound to only one bucket, but we get to them based on the bucket we will be interacting with, for clarity. We removed the cached session object, as it serves no real purpose.	2024-01-05 11:12:18 -08:00
Alex Vandiver	214bd4ed88	s3: Stop caching get_boto_client, which is only ever called once. `e883ab057f` started caching the boto client, which we had identified as slow call. `e883ab057f` went further, calling `get_boto_client().generate_presigned_url()` once and caching that result. This makes the inner cache on the client useless. Remove it.	2024-01-05 11:12:18 -08:00
Anders Kaseorg	3853fa875a	python: Consistently use from…import for urllib.parse. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-12-05 13:03:07 -08:00
Anders Kaseorg	a50eb2e809	mypy: Enable new error explicit-override. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-10-12 12:28:41 -07:00
Anders Kaseorg	55aa29bef4	ruff: Fix FLY002 Consider f"…" instead of string join. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2023-08-07 17:12:41 -07:00
Alex Vandiver	d957559371	uploads: Allow uploads to set storage class. Uploads are well-positioned to use S3's "intelligent tiering" storage class. Add a setting to let uploaded files to declare their desired storage class at upload time, and document how to move existing files to the same storage class.	2023-07-19 16:19:34 -07:00
Alex Vandiver	e408f069fe	uploads: Add a method to copy attachment contents out.	2023-04-07 09:13:48 -07:00
Alex Vandiver	3bf3f47b49	delete_old_unclaimed_attachments: Add flag to clean up storage. Actions like deleting realms may leave unreferenced uploads in the attachment storage backend. Fix these by walking the complete contents of the attachment storage backend, and removing files which are no longer present in the database. This may take quite some time, as it is necessarily O(n) in the number of files uploaded to the system.	2023-03-02 16:36:19 -08:00
Alex Vandiver	c9d1755a12	delete_realm: Optimize attachment cleanup by batching.	2023-03-02 16:36:19 -08:00
Alex Vandiver	b31a6dc56c	upload: Reorder functions into logical groupings.	2023-03-02 16:36:19 -08:00
Alex Vandiver	bd80c048be	upload: Rename delete_message_image to use word "attachment". The table is named Attachment, and not all of them are images.	2023-03-02 16:36:19 -08:00
Alex Vandiver	567d1d54e7	upload: Rename upload_message_file to use word "attachment". For consistency with the table, which is named Attachment.	2023-03-02 16:36:19 -08:00

1 2

54 Commits