Commit Graph

1787 Commits

Author SHA1 Message Date
Alex Vandiver
5bda1ac521 kandra: Include the new grok monitoring exporter for all frontend hosts. 2025-03-28 10:50:03 -04:00
Alex Vandiver
840fa74854 kandra: Add a grok exporter to parse nginx logfiles.
This provides access logging metrics to Prometheus.  For cardinality
reasons, we cannot (nor would we want to) put every request path into
its own label value -- but we do separate out the most-frequent access
paths (as well as some low-frequency but high-interest ones) into
their own label values.

In order to differentiate accesses to https://zulip.com/ from
https://example.zulipchat.com/ (both of which appear at path `/`), we
use a `grok_exporter.realm_names_regex` value in `zulip.conf`, which
is expected to be set to match the hostname of all possible realms.
2025-03-27 21:56:54 -07:00
Alex Vandiver
bd54f0363e kandra: Update prometheus configuration.
This pulls in the more complete production Prometheus configuration.
2025-03-27 21:56:54 -07:00
Tim Abbott
0fecb497a2 puppet: Remove legacy camo-purging logic.
This is no longer required, since servers must upgrade via 5.x or
higher in order to get to the current release.
2025-03-26 10:54:38 -07:00
Alex Vandiver
b0292ffe37 puppet: Exclude pgroonga from unattended-upgrades. 2025-03-24 11:42:50 -07:00
Alex Vandiver
023e634e98 nginx: Allow adding extra monitoring paths in a localhost.d. 2025-03-21 18:07:22 -07:00
Alex Vandiver
29a0d287fc puppet: Allow for arbitrary queues to have more than one worker.
This generalizes from thumbnail_workers, to include any other queue.
We only additionally choose to document `email_senders_workers`,
however, since other queues are not guaranteed to work correctly with
multiple consumers.
2025-03-14 14:07:09 -07:00
Alex Vandiver
eca77631b4 kandra: Rename onboarding-video to navigation-tour-video. 2025-03-13 09:54:44 -07:00
Tim Abbott
b16bd27e9c puppet: Add gettext to dependencies for app instances.
This is already installed on a lot of systems, and is used indirectly
when upgrading Zulip from Git.

We previously removed this in
263212decf, I believe due to an
incorrect understanding of only makemessages needing it.
2025-03-06 13:08:08 -08:00
Alex Vandiver
fa3983ac46 kandra: Add /static/onboarding-video/ static content. 2025-03-05 22:37:19 -08:00
Alex Vandiver
412826e08b kandra: Add zulip-notify hooks for CZO. 2025-03-05 13:57:10 -08:00
Alex Vandiver
c5200e8b05 deliver_scheduled_emails: Use a queue, instead of infinite retries.
`deliver_scheduled_emails` tries to deliver the email synchronously,
and if it fails, it retries after 10 seconds.  Since it does not track
retries, and always tries the earliest-scheduled-but-due message
first, the worker will not make forward progress if there is a
persistent failure with that message, and will retry indefinitely.
This can result in excessive network or email delivery charges from
the remote SMTP server.

Switch to delivering emails via a new queue worker.  The
`deliver_scheduled_emails` job now serves only to pull deferred jobs
out of the table once they are due, insert them into RabbitMQ, and
then delete them.  This limits the potential for head-of-queue
failures to failures inserting into RabbitMQ, which is more reasonable
than failures speaking to a complex external system we do not control.
Retries and any connections to the SMTP server are left to the
RabbitMQ consumer.

We build a new RabbitMQ queue, rather than use the existing
`email_senders` queue, because that queue is expected to be reasonably
low-latency, for things like missed message notifications.  The
`send_future_email` codepath which inserts into ScheduledEmails is
also (ab)used to digest emails, which are extremely bursty in their
frequency -- and a large burst could significantly delay emails behind
it in the queue.

The new queue is explicitly only for messages which were not initiated
by user actions (e.g., invitation reminders, digests, new account
follow-ups) which are thus not latency-sensitive.

Fixes: #32463.
2025-03-04 16:09:25 -08:00
Anders Kaseorg
d7556b4060 requirements: Migrate to uv.
https://docs.astral.sh/uv/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-24 22:29:24 -08:00
Alex Vandiver
c84c76f3a1 puppet: Upgrade external dependencies. 2025-02-21 14:38:31 -08:00
Alex Vandiver
3e421d71ba kandra: Update Teleport CA. 2025-02-21 10:16:33 -08:00
Alex Vandiver
ce34ebdfed kandra: Add a hook to notify of local schema drift from merge-base. 2025-02-19 12:43:00 -08:00
Alex Vandiver
af4fa75b66 puppet: Upgrade version of aws tool. 2025-01-28 17:43:08 -08:00
Alex Vandiver
3ec896ebda nginx: Add an option which defaults loadbalancer requests to https.
In some cases, it is not possible to configure the load-balancer to
add an X-Forwarded-Proto header.  If Zulip is serving its traffic over
HTTP, it will rightly error out, since it cannot guarantee that its
response will be served over an encrypted connection.

Add a new `loadbalancer.rejects_http_requests` settings which serves
as a way for the operator to swear that the load-balancer will *never*
serve responses from Zulip over an unencrypted connection.  In most
cases, this is because the load-balancer is configured to have port 80
always serve an HTTP 301 redirect to the same URL over HTTPS.

Properly configuring the proxy to send `X-Forwarded-Proto` is always a
better solution than using this configuration parameter, so use of
this should be viewed as a last resort.
2025-01-22 12:25:42 -08:00
Prakhar Pratyush
86cd8349f7 cron: Update 'update-channel-recently-active-status' to run weekly.
This commit updates the 'update-channel-recently-active-status'
cron job to run weekly instead of daily.
2025-01-02 16:37:34 -08:00
Anders Kaseorg
58822372d5 typos: Fix typos caught by typos and mwic.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-24 19:15:36 -08:00
Anders Kaseorg
19b8cde27f ruff: Fix PLC0206 Extracting value from dictionary without calling .items().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-21 21:06:53 -08:00
Aman Agrawal
50256f4831 stream: Add field to track active status of stream. 2024-11-27 17:31:06 -08:00
Tim Abbott
11d6273990 Update puppet/zulip/manifests/app_frontend_base.pp
Co-authored-by: Anders Kaseorg <andersk@mit.edu>
2024-11-15 15:08:33 -08:00
Tim Abbott
9d68d89d01 puppet: Require libldap-common be installed.
Zulip instances without a database included, like the Docker image,
would not fail to use TLS properly, since `TLS_REQCERT` was not set in
`/etc/ldap/ldap.conf`. While there's a few other ways we could fix
this, just installing libldap-common on app frontend instances seems
like a good solution, and has no impact on other Zulip systems, and it
was already being installed through a "Recommends" tier apt dependency
indirectly from the PostgreSQL server package.

Fixes zulip/docker-zulip#454.
2024-11-15 15:08:33 -08:00
Anders Kaseorg
2bb87aebec install: Remove PostgreSQL 12 support.
PostgreSQL 12 reaches end of life on November 14, 2024.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-01 15:33:31 -07:00
Alex Vandiver
f325e15439 nagios: Switch staging hosts to not page, but send a zulip. 2024-10-08 16:55:07 -07:00
Alex Vandiver
1bd0ab506c nginx: Make uwsgi timeout shorter than nginx-to-uwsgi timeout.
The nginx-to-uwsig-timeout defaults to 60s, which is exactly the same
as the current "harakiri" timeout configured in uwsgi (which limits
the length a request can run before the worker is terminated).  This
causes a race, where if nginx hits its 60s before uwsgi, then we
return a 504; otherwise, we get a 502.

Make the nginx-to-uwsgi timeout explicit, and shorten the "harakiri"
timeout to be explicitly less than that.  Document the 60s timeout,
which all outer reverse proxies must be set to _longer than_ in order
to have proper "onion" timeouts.
2024-10-07 15:41:08 -07:00
Alex Vandiver
7ddcf3774b puppet: Add tusd daily logrotation. 2024-10-04 14:22:37 -07:00
Alex Vandiver
2571196899 puppet: Remove unnecessary "create" directive.
It is irrelevant, since copytruncate is used.
2024-10-04 14:22:37 -07:00
Alex Vandiver
e6c64e78e6 puppet: Switch logrotate to be in charge of tornado logs. 2024-10-04 14:22:37 -07:00
Alex Vandiver
34308efb94 puppet: Upgrade sentry-cli.
This version causes `sentry-cli monitors run` to not fail if Sentry is
down (getsentry/sentry-cli#2169).
2024-09-30 11:24:36 -07:00
Alex Vandiver
87ee167726 puppet: Allow tusd to be exposed on non-localhost.
This allows its /metrics endpoint to be monitored.
2024-09-27 15:06:47 -07:00
Alex Vandiver
0c7d83f7da kandra: Use vector to plumb SES logs into S3. 2024-09-26 11:19:45 -07:00
Alex Vandiver
60759ab5fb kandra: Use generic "vector" process, not dedicated "akamai" process.
This makes the Vector configuration extensible, to allow it to be used
not just for ingesting Akamai logs.
2024-09-26 11:19:45 -07:00
Alex Vandiver
b0ca32c955 nginx: Fix missing word in comment. 2024-09-25 11:15:03 -07:00
Alex Vandiver
77a121082b kandra: Add localhost access to internal APIs on port 80.
This parallels 02d3fb7666.
2024-09-25 10:08:27 -07:00
Alex Vandiver
24d110f063 settings: Increase default max file upload size to 100MB.
This also _lowers_ the default nginx client_max_body_size, since that
no longer caps the upload file size.
2024-09-19 11:37:29 -07:00
Alex Vandiver
818c30372f upload: Use tusd for resumable, larger uploads.
Currently, it handles two hook types: 'pre-create' (to verify that the
user is authenticated and the file size is within the limit) and
'pre-finish' (which creates an attachment row).

No secret is shared between Django and tusd for authentication of the
hooks endpoints, because none is necessary -- tusd forwards the
end-user's credentials, and the hook checks them like it would any
end-user request.  An end-user gaining access to the endpoint would be
able to do no more harm than via tusd or the normal file upload API.

Regardless, the previous commit has restricted access to the endpoint
at the nginx layer.

Co-authored-by: Brijmohan Siyag <brijsiyag@gmail.com>
2024-09-19 11:37:29 -07:00
Alex Vandiver
02d3fb7666 nginx: Allow HTTP access to internal endpoints from localhost. 2024-09-19 11:37:29 -07:00
Alex Vandiver
c34913b4d7 nginx: Limit access to internal endpoints, to localhost. 2024-09-17 12:51:30 -07:00
Alex Vandiver
64a16dd9b4 kandra: We do not serve staging from staging.zulip.com:80.
It is not configured in the ALB's rules, nor does the ALB ever forward
to port 80.
2024-09-09 15:17:19 -07:00
Alex Vandiver
9497f23307 puppet: Make restart-server cron use standard day-of-week.
Using 7 makes this fail when run under the Sentry cron wrapper.
2024-08-30 13:13:05 -07:00
Alex Vandiver
6cc480e2b5 kandra: Ensure that we do not error out creating standby.signal.
Force us to have restored the `/var/lib/postgresql/14/main` directory
before we try to create `standby.signal` in it.
2024-08-20 14:03:51 -07:00
Alex Vandiver
49e5722a9f kandra: Match the logic in zulip::profile::postgresql. 2024-08-20 14:03:51 -07:00
Alex Vandiver
9e58780d46 puppet: Fix indentation. 2024-08-20 14:03:51 -07:00
Alex Vandiver
47ebef057e puppet: Switch to an included limits.d file. 2024-08-20 10:04:40 -07:00
Alex Vandiver
c0237e3d58 kandra: Correct group ownership and permissions to match package's.
This reverts b000328ba5 and re-applies
a759d26a32.  It is likely that the
upstream packages are simply inconsistent.
2024-08-20 10:04:40 -07:00
Alex Vandiver
fe0a7b3b77 kandra: Drop support for 20.04 hosts. 2024-08-20 10:03:59 -07:00
Alex Vandiver
dd5b03ea12 wal-g: Support setting the compression method. 2024-08-16 13:57:58 -07:00
Alex Vandiver
2840e68548 static: Pre-compress with zopfli, for better compression.
Zopfli[^1] performs very good, but time-intensive, zlib compression.
It is hence only suitable for pre-compressing objects, not on-the-fly
compression.

Use a webpack plugin to write pre-compressed versions of JS and CSS
assets using Zopfli, and configure nginx to serve those assets when
`Accept-Encoding: gzip` is provided.

This reduces the size of the JS and CSS assets on initial pageload
from 1422872 bytes to 1108267 bytes, or about a 22% savings.

[^1]: https://github.com/google/zopfli
2024-08-13 09:01:06 -07:00