1856 Commits

Author SHA1 Message Date
Alex Vandiver
a4ddc0a2d5 postgres: Remove lower-than-suggested checkpoint_completion_target.
This defaults to 0.9, and lowering it is not generally suggested[^1].

[^1]: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET
2025-05-12 11:12:45 -07:00
Alex Vandiver
f47d150fb5 puppet: Allow overrides of computed PostgreSQL config settings.
The four precomputed memory values gain overrides in `zulip.conf` with
the same names, for more precise manual tuning.
2025-05-12 11:12:45 -07:00
Alex Vandiver
5bc75a94be puppet: Base PostgreSQL tuning on if the frontend is on the same host.
The previous thresholds were set based on the assumption that 50% of
the memory on the host was taken up by Django and friends, leaving
only 50% for the database host.

On separate database hosts, this leads to a too-small default value
for shared_buffers, which is generally suggested to be 25% of your
database-dedicated memory.

Check if there is a `zulip::app_frontend_base` as part of the deploy,
and use that to adjust if we halve `$zulip::common::total_memory_mb`
or not for purposes of calculating PostgreSQL memory settings.  We
also provide a `postgresql.memory` override, if necessary.
2025-05-12 11:12:45 -07:00
Alex Vandiver
1b9a8dbc52 kandra: Allow PostgreSQL primary ssh access for PostgreSQL upgrade.
This allows scripting of the whole upgrade process; these grants are
temporary.
2025-05-09 10:05:19 -07:00
Alex Vandiver
a959f71208 kandra: Support installing multiple ssh keys from multiple secrets. 2025-05-09 10:05:19 -07:00
Alex Vandiver
614fdee5ca kandra: Add a pattern grouping for /api/v1/tus/.
We already had one for `/api/v1/tus/...`, but not the bare
`/api/v1/tus/` which is used to initiate the upload.
2025-05-05 12:13:16 -07:00
Alex Vandiver
21eff33875 puppet: Upgrade tusd to 2.8.0. 2025-04-11 12:13:07 -07:00
Alex Vandiver
309876796e kandra: Add a couple useful command-line tools. 2025-04-11 11:25:58 -07:00
Alex Vandiver
b11cbbab01 smokescreen: Move metrics port from the default 9810, to 4760.
This prevents errors if Smokescreen is running on a host with more
than 10 Tornado shards.
2025-04-11 10:29:26 -07:00
Alex Vandiver
5f783ed5ad nginx: Relay the same Host: header that nginx saw.
Unilaterally adding the port can cause CSRF failures when the port is
a default port, and thus optional.  Switch to providing the exact
`Host` header that the original request contained.
2025-04-09 10:44:12 -07:00
Anders Kaseorg
80b607c8cb install: Remove PostgreSQL 13 support.
PostgreSQL 13 reaches end of life on November 13, 2025, and Django 5.2
does not support it.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-04-07 17:41:55 -07:00
Anders Kaseorg
818742c62b install: Support PostgreSQL 17.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-04-07 16:42:19 -07:00
Alex Vandiver
e2e0c72a80 nginx: Serve full app from localhost.
Some deployments choose to wrap Zulip's nginx in an outer proxy -- for
example, to do custom TLS termination.  In such deployments, the outer
proxy is routing to `127.0.0.1:80`; b4fb22ba1b breaks these
configurations, as it switches the `127.0.0.1:80` listener to only
serving `/api/internal/` paths.

Switch to serving the whole application over `127.0.0.1:80`.
2025-04-04 10:58:09 -07:00
Alex Vandiver
d2772a0a38 postgresql: Remove inaccurate comment.
PostgreSQL 10 and later can adjust their TLS configuration without a
restart, with `SELECT pg_reload_conf()` or `pg_ctlcluster 16 main
reload`.
2025-04-02 13:07:31 -07:00
Alex Vandiver
9073ad3cc5 kandra: Add missing variable used in the template. 2025-04-02 09:55:21 -07:00
Alex Vandiver
258003a1c8 kandra: Extract the Teleport host CA to disk. 2025-04-02 09:55:21 -07:00
Alex Vandiver
23e8eb5c7c nginx: Use cache slicing to prevent thundering herds for video thumbs.
This prevents a thundering herd for videos -- if a very large video is
posted to a channel with many active clients, all of them
simultaneously request it, to provide the in-feed preview image.
While these requests come with a `Range` header which is intended to
limit the request to just the first couple MB, nginx ignores this
header when making its request to the upstream -- so it can obtain and
cache the whole file locally.  This results in multiple competing
requests for the whole content from S3, all racing to store the
content in the cache.

Use cache slicing to split the content cache into chunks of 5MB; the
cache is filled one slice at a time, as needed based on the byte
ranges that clients request.  Clients making requests without a
`Range` header are provided with the content transparently stitched
together from the individual slices.

The slice size of 5MB is chosen to encompass more 95% of file
uploads (saving an extra trip to the origin) while also being large
enough to be able to provide video thumbnails in a single slice, as
well as not take too much time to obtain from the upstream.
2025-04-01 13:35:28 -07:00
Alex Vandiver
4e26705fbc nginx: Tell the backend service what port we listen on.
The `$host` nginx variable is _not_ the unadulterated `Host`
header (which would be `$http_host`) -- it is that header, *without
the port*, with a fallback to the `server_name` which processed the
request.

This means that backend services are not aware of the port that the
request came in on, unless they derive that from reading
`nginx_listen_port` in `/etc/zulip/zulip.conf`, or similar.
Specifically, this caused `tusd`, on deploys with non-standard
`nginx_listen_port`, to generate a `Location` header which left off
the port, and as such attempted a CORS check when retrieving metadata
about the just-uploaded file, which failed.

Add the port to the `Host` header we pass to `tusd` and other backend
services.
2025-04-01 09:28:18 -07:00
Alex Vandiver
b4fb22ba1b nginx: Move localhost to its own block, bound to the loopback address.
This makes the `localhost.d` directory less of a lie, and decreases
the chances that local reconfigurations will break the 127.0.0.1:80
server which is used for IPC.

In cases where `nginx_http_only` is enabled, we respect
`nginx_listen_port` soas to not attempt to bind on port 80 if the
administrator was explicitly attempting to avoid that.
2025-04-01 09:28:18 -07:00
Alex Vandiver
5bda1ac521 kandra: Include the new grok monitoring exporter for all frontend hosts. 2025-03-28 10:50:03 -04:00
Alex Vandiver
840fa74854 kandra: Add a grok exporter to parse nginx logfiles.
This provides access logging metrics to Prometheus.  For cardinality
reasons, we cannot (nor would we want to) put every request path into
its own label value -- but we do separate out the most-frequent access
paths (as well as some low-frequency but high-interest ones) into
their own label values.

In order to differentiate accesses to https://zulip.com/ from
https://example.zulipchat.com/ (both of which appear at path `/`), we
use a `grok_exporter.realm_names_regex` value in `zulip.conf`, which
is expected to be set to match the hostname of all possible realms.
2025-03-27 21:56:54 -07:00
Alex Vandiver
bd54f0363e kandra: Update prometheus configuration.
This pulls in the more complete production Prometheus configuration.
2025-03-27 21:56:54 -07:00
Tim Abbott
0fecb497a2 puppet: Remove legacy camo-purging logic.
This is no longer required, since servers must upgrade via 5.x or
higher in order to get to the current release.
2025-03-26 10:54:38 -07:00
Alex Vandiver
b0292ffe37 puppet: Exclude pgroonga from unattended-upgrades. 2025-03-24 11:42:50 -07:00
Alex Vandiver
023e634e98 nginx: Allow adding extra monitoring paths in a localhost.d. 2025-03-21 18:07:22 -07:00
Alex Vandiver
29a0d287fc puppet: Allow for arbitrary queues to have more than one worker.
This generalizes from thumbnail_workers, to include any other queue.
We only additionally choose to document `email_senders_workers`,
however, since other queues are not guaranteed to work correctly with
multiple consumers.
2025-03-14 14:07:09 -07:00
Alex Vandiver
eca77631b4 kandra: Rename onboarding-video to navigation-tour-video. 2025-03-13 09:54:44 -07:00
Tim Abbott
b16bd27e9c puppet: Add gettext to dependencies for app instances.
This is already installed on a lot of systems, and is used indirectly
when upgrading Zulip from Git.

We previously removed this in
263212decf, I believe due to an
incorrect understanding of only makemessages needing it.
2025-03-06 13:08:08 -08:00
Alex Vandiver
fa3983ac46 kandra: Add /static/onboarding-video/ static content. 2025-03-05 22:37:19 -08:00
Alex Vandiver
412826e08b kandra: Add zulip-notify hooks for CZO. 2025-03-05 13:57:10 -08:00
Alex Vandiver
c5200e8b05 deliver_scheduled_emails: Use a queue, instead of infinite retries.
`deliver_scheduled_emails` tries to deliver the email synchronously,
and if it fails, it retries after 10 seconds.  Since it does not track
retries, and always tries the earliest-scheduled-but-due message
first, the worker will not make forward progress if there is a
persistent failure with that message, and will retry indefinitely.
This can result in excessive network or email delivery charges from
the remote SMTP server.

Switch to delivering emails via a new queue worker.  The
`deliver_scheduled_emails` job now serves only to pull deferred jobs
out of the table once they are due, insert them into RabbitMQ, and
then delete them.  This limits the potential for head-of-queue
failures to failures inserting into RabbitMQ, which is more reasonable
than failures speaking to a complex external system we do not control.
Retries and any connections to the SMTP server are left to the
RabbitMQ consumer.

We build a new RabbitMQ queue, rather than use the existing
`email_senders` queue, because that queue is expected to be reasonably
low-latency, for things like missed message notifications.  The
`send_future_email` codepath which inserts into ScheduledEmails is
also (ab)used to digest emails, which are extremely bursty in their
frequency -- and a large burst could significantly delay emails behind
it in the queue.

The new queue is explicitly only for messages which were not initiated
by user actions (e.g., invitation reminders, digests, new account
follow-ups) which are thus not latency-sensitive.

Fixes: #32463.
2025-03-04 16:09:25 -08:00
Anders Kaseorg
d7556b4060 requirements: Migrate to uv.
https://docs.astral.sh/uv/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-24 22:29:24 -08:00
Alex Vandiver
c84c76f3a1 puppet: Upgrade external dependencies. 2025-02-21 14:38:31 -08:00
Alex Vandiver
3e421d71ba kandra: Update Teleport CA. 2025-02-21 10:16:33 -08:00
Alex Vandiver
ce34ebdfed kandra: Add a hook to notify of local schema drift from merge-base. 2025-02-19 12:43:00 -08:00
Alex Vandiver
af4fa75b66 puppet: Upgrade version of aws tool. 2025-01-28 17:43:08 -08:00
Alex Vandiver
3ec896ebda nginx: Add an option which defaults loadbalancer requests to https.
In some cases, it is not possible to configure the load-balancer to
add an X-Forwarded-Proto header.  If Zulip is serving its traffic over
HTTP, it will rightly error out, since it cannot guarantee that its
response will be served over an encrypted connection.

Add a new `loadbalancer.rejects_http_requests` settings which serves
as a way for the operator to swear that the load-balancer will *never*
serve responses from Zulip over an unencrypted connection.  In most
cases, this is because the load-balancer is configured to have port 80
always serve an HTTP 301 redirect to the same URL over HTTPS.

Properly configuring the proxy to send `X-Forwarded-Proto` is always a
better solution than using this configuration parameter, so use of
this should be viewed as a last resort.
2025-01-22 12:25:42 -08:00
Prakhar Pratyush
86cd8349f7 cron: Update 'update-channel-recently-active-status' to run weekly.
This commit updates the 'update-channel-recently-active-status'
cron job to run weekly instead of daily.
2025-01-02 16:37:34 -08:00
Anders Kaseorg
58822372d5 typos: Fix typos caught by typos and mwic.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-24 19:15:36 -08:00
Anders Kaseorg
19b8cde27f ruff: Fix PLC0206 Extracting value from dictionary without calling .items().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-21 21:06:53 -08:00
Aman Agrawal
50256f4831 stream: Add field to track active status of stream. 2024-11-27 17:31:06 -08:00
Tim Abbott
11d6273990 Update puppet/zulip/manifests/app_frontend_base.pp
Co-authored-by: Anders Kaseorg <andersk@mit.edu>
2024-11-15 15:08:33 -08:00
Tim Abbott
9d68d89d01 puppet: Require libldap-common be installed.
Zulip instances without a database included, like the Docker image,
would not fail to use TLS properly, since `TLS_REQCERT` was not set in
`/etc/ldap/ldap.conf`. While there's a few other ways we could fix
this, just installing libldap-common on app frontend instances seems
like a good solution, and has no impact on other Zulip systems, and it
was already being installed through a "Recommends" tier apt dependency
indirectly from the PostgreSQL server package.

Fixes zulip/docker-zulip#454.
2024-11-15 15:08:33 -08:00
Anders Kaseorg
2bb87aebec install: Remove PostgreSQL 12 support.
PostgreSQL 12 reaches end of life on November 14, 2024.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-01 15:33:31 -07:00
Alex Vandiver
f325e15439 nagios: Switch staging hosts to not page, but send a zulip. 2024-10-08 16:55:07 -07:00
Alex Vandiver
1bd0ab506c nginx: Make uwsgi timeout shorter than nginx-to-uwsgi timeout.
The nginx-to-uwsig-timeout defaults to 60s, which is exactly the same
as the current "harakiri" timeout configured in uwsgi (which limits
the length a request can run before the worker is terminated).  This
causes a race, where if nginx hits its 60s before uwsgi, then we
return a 504; otherwise, we get a 502.

Make the nginx-to-uwsgi timeout explicit, and shorten the "harakiri"
timeout to be explicitly less than that.  Document the 60s timeout,
which all outer reverse proxies must be set to _longer than_ in order
to have proper "onion" timeouts.
2024-10-07 15:41:08 -07:00
Alex Vandiver
7ddcf3774b puppet: Add tusd daily logrotation. 2024-10-04 14:22:37 -07:00
Alex Vandiver
2571196899 puppet: Remove unnecessary "create" directive.
It is irrelevant, since copytruncate is used.
2024-10-04 14:22:37 -07:00
Alex Vandiver
e6c64e78e6 puppet: Switch logrotate to be in charge of tornado logs. 2024-10-04 14:22:37 -07:00
Alex Vandiver
34308efb94 puppet: Upgrade sentry-cli.
This version causes `sentry-cli monitors run` to not fail if Sentry is
down (getsentry/sentry-cli#2169).
2024-09-30 11:24:36 -07:00