nginx: Use cache slicing to prevent thundering herds for video thumbs.

This prevents a thundering herd for videos -- if a very large video is
posted to a channel with many active clients, all of them
simultaneously request it, to provide the in-feed preview image.
While these requests come with a `Range` header which is intended to
limit the request to just the first couple MB, nginx ignores this
header when making its request to the upstream -- so it can obtain and
cache the whole file locally.  This results in multiple competing
requests for the whole content from S3, all racing to store the
content in the cache.

Use cache slicing to split the content cache into chunks of 5MB; the
cache is filled one slice at a time, as needed based on the byte
ranges that clients request.  Clients making requests without a
`Range` header are provided with the content transparently stitched
together from the individual slices.

The slice size of 5MB is chosen to encompass more 95% of file
uploads (saving an extra trip to the origin) while also being large
enough to be able to provide video thumbnails in a single slice, as
well as not take too much time to obtain from the upstream.
This commit is contained in:
Alex Vandiver
2025-04-01 18:05:29 +00:00
committed by Tim Abbott
parent ca1e56d91b
commit 23e8eb5c7c

View File

@@ -43,6 +43,16 @@ location ~ ^/internal/s3/(?<s3_hostname>[^/]+)/(?<s3_path>.*) {
# the first response. Django explicitly unsets the first, and
# does not set the latter two.
# We slice the content into 5M chunks; this means that the client
# doesn't need to wait for nginx to download and cache the full
# content if the client just requested a small range (e.g. for
# showing a thumbnail of a large video). 5M is chosen to be
# enough for videos to be able to thumbnail in one slice, but not
# take overly long to retrieve from S3, or cause overwhelming
# numbers of cache entries for large files.
slice 5m;
proxy_set_header Range $slice_range;
proxy_pass $download_url$is_args$args;
proxy_cache uploads;
# If the S3 response doesn't contain Cache-Control headers (which
@@ -50,12 +60,12 @@ location ~ ^/internal/s3/(?<s3_hostname>[^/]+)/(?<s3_path>.*) {
# long time. The size of the cache is controlled by
# `s3_disk_cache_size` and read frequency, set via
# `s3_cache_inactive_time`.
proxy_cache_valid 200 1y;
proxy_cache_valid 200 206 1y;
# We only include the requested content-disposition in the cache
# key, so that we cache "Content-Disposition: attachment"
# separately from the inline version.
proxy_cache_key $download_url$s3_disposition_cache_key;
# We only include the requested content-disposition (and range
# slice) in the cache key, so that we cache "Content-Disposition:
# attachment" separately from the inline version.
proxy_cache_key $download_url$s3_disposition_cache_key$slice_range;
}
# Internal file-serving