This is done synchronously, despite taking ~60s. We can move it to a
background thread later if that's an issue, but generally Prometheus
is tolerant to exporters taking a while to come back with results.
This is most important for `send_zulip_update_announcements`, which
can race with the version run as a post-deploy hook. However, all of
these crons can tolerate being slightly delayed, and there's little
benefit to them taking CPU or possibly hitting odd borderline race
conditions when the deploy is in progress.
For safety, we only trust the deploy lockfile if it was created
within the last hour -- deploys should not take more than an hour, and
failing to ever run hourly crons is much worse than perhaps running
them during a real very-long deploy.
Although this code was not actually vulnerable as written, we never
want to be disabling this Ruff rule, in order to discourage later
introduction of vulnerabilities.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
Using postfix to handle the incoming email gateway complicates things
a great deal:
- It cannot verify that incoming email addresses exist in Zulip before
accepting them; it thus accepts mail at the `RCPT TO` stage which it
cannot handle, and thus must reject after the `DATA`.
- It is built to handle both incoming and outgoing email, which
results in subtle errors (1c17583ad5, 79931051bd, a53092687e,
#18600).
- Rate-limiting happens much too late to avoid denial of
service (#12501).
- Mis-configurations of the HTTP endpoint can break incoming
mail (#18105).
Provide a replacement SMTP server which accepts incoming email on port
25, verifies that Zulip can accept the address, and that no
rate-limits are being broken, and then adds it directly to the
relevant queue.
Removes an incorrect comment which implied that missed-message
addresses were only usable once. We leave rate-limiting to only
channel email addresses, since missed-message addresses are unlikely
to be placed into automated systems, as channel email addresses are.
Also simplifies #7814 somewhat.
This provides at least a 10x speedup.
We could use a CTE to fetch from, and lock, `fts_update_log` at the
same time as performing the `zerver_message` update -- however, this
would mean that the number of returned rows would be less than
BATCH_SIZE if some rows in `zerver_message` no longer existed, causing
premature termination of the outer loop.
We lock the `zerver_message` rows in ascending ID order, to prevent
deadlocks. We use `ctid` values[^1] to transfer the rows from the
inner query to the outer update; this is safe, as the rows are locked
from updates.
[^1]: https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CTID
Every insert into `fts_update_log` triggers a `NOTIFY`; processing a
batch of 1000 for every NOTIFY that we get thus results in a lot of
extra processing of no rows.
Instead, we clear all notifies, and repeat the pattern of processing
rows until the batch comes up short. This may still result in extra
wake-ups, in the event that a new NOTIFY occurs after we clear them,
and that new row is processed by us in `update_all_rows`. However,
these wake-ups are necessary for correctness, and this change will
result in drastically fewer extra checks.
This code dates back to 57b52310639a; however, this has been handled
by `postgresql-common` adding a post-install trigger to call
`pg_updatedicts` for each new PostgreSQL version, since
`postgresql-common` version 153 (February 2014).
This uses the same technique used in 840884ec89, to only apply select
parts of the Puppet configuration. This is more correct, and simpler,
than attempting to chop out some base puppet roles, and hack around
the `purge => true` supervisor.d configuration.
The value, if unset, defaults to[^1]:
> a size equal to 1/32nd (about 3%) of shared_buffers, but not less
> than 64kB nor more than the size of one WAL segment, typically 16MB
Letting it choose this, by default, but allowing an override via
`zulip.conf`, seems preferable to hard-coding an arbitrary value.
[^1]: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-BUFFERS
The previous thresholds were set based on the assumption that 50% of
the memory on the host was taken up by Django and friends, leaving
only 50% for the database host.
On separate database hosts, this leads to a too-small default value
for shared_buffers, which is generally suggested to be 25% of your
database-dedicated memory.
Check if there is a `zulip::app_frontend_base` as part of the deploy,
and use that to adjust if we halve `$zulip::common::total_memory_mb`
or not for purposes of calculating PostgreSQL memory settings. We
also provide a `postgresql.memory` override, if necessary.
Unilaterally adding the port can cause CSRF failures when the port is
a default port, and thus optional. Switch to providing the exact
`Host` header that the original request contained.
Some deployments choose to wrap Zulip's nginx in an outer proxy -- for
example, to do custom TLS termination. In such deployments, the outer
proxy is routing to `127.0.0.1:80`; b4fb22ba1b breaks these
configurations, as it switches the `127.0.0.1:80` listener to only
serving `/api/internal/` paths.
Switch to serving the whole application over `127.0.0.1:80`.
This prevents a thundering herd for videos -- if a very large video is
posted to a channel with many active clients, all of them
simultaneously request it, to provide the in-feed preview image.
While these requests come with a `Range` header which is intended to
limit the request to just the first couple MB, nginx ignores this
header when making its request to the upstream -- so it can obtain and
cache the whole file locally. This results in multiple competing
requests for the whole content from S3, all racing to store the
content in the cache.
Use cache slicing to split the content cache into chunks of 5MB; the
cache is filled one slice at a time, as needed based on the byte
ranges that clients request. Clients making requests without a
`Range` header are provided with the content transparently stitched
together from the individual slices.
The slice size of 5MB is chosen to encompass more 95% of file
uploads (saving an extra trip to the origin) while also being large
enough to be able to provide video thumbnails in a single slice, as
well as not take too much time to obtain from the upstream.
The `$host` nginx variable is _not_ the unadulterated `Host`
header (which would be `$http_host`) -- it is that header, *without
the port*, with a fallback to the `server_name` which processed the
request.
This means that backend services are not aware of the port that the
request came in on, unless they derive that from reading
`nginx_listen_port` in `/etc/zulip/zulip.conf`, or similar.
Specifically, this caused `tusd`, on deploys with non-standard
`nginx_listen_port`, to generate a `Location` header which left off
the port, and as such attempted a CORS check when retrieving metadata
about the just-uploaded file, which failed.
Add the port to the `Host` header we pass to `tusd` and other backend
services.