1607 Commits

Author SHA1 Message Date
Alex Vandiver
9ea1b99dbb nginx: Strip off request headers which might affect S3's behaviour.
Clients making requests to Zulip with a `Authorization: Basic ...` for
an upload in S3 pass along all of their request headers to the S3
backend -- causing errors of the form:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>InvalidArgument</Code>
  <Message>Only one auth mechanism allowed; only the X-Amz-Algorithm
  query parameter, Signature query string parameter or the
  Authorization header should be specified</Message>
  <ArgumentName>Authorization</ArgumentName>
  <ArgumentValue>Basic ...</ArgumentValue>
  <RequestId>...</RequestId>
  <HostId>...</HostId>
</Error>
```

Strip off all request headers which AWS reports that S3 may read[^1].

Fixes: #30180.

[^1]: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html

(cherry picked from commit 549f4fe00b)
2024-07-26 14:27:45 -07:00
Alex Vandiver
b9ff0c4bf0 django: Skip running config and database checks on process start.
The process of running Django's built-in database and config checks
can be very heavy-weight, potentially taking multiple seconds:

```
$ hyperfine './manage.py print_initial_password iago@zulip.com'  './manage.py print_initial_password iago@zulip.com --skip-checks'
Benchmark 1: ./manage.py print_initial_password iago@zulip.com
  Time (mean ± σ):      4.943 s ±  0.722 s    [User: 4.434 s, System: 0.311 s]
  Range (min … max):    4.415 s …  6.835 s    10 runs

Benchmark 2: ./manage.py print_initial_password iago@zulip.com --skip-checks
  Time (mean ± σ):      1.786 s ±  0.113 s    [User: 1.598 s, System: 0.162 s]
  Range (min … max):    1.576 s …  1.999 s    10 runs

Summary
  './manage.py print_initial_password iago@zulip.com --skip-checks' ran
    2.77 ± 0.44 times faster than './manage.py print_initial_password iago@zulip.com'
```

This extends the window during which nginx is forced to serve 502's to
clients.  f5f6a3789b added an explicit `manage.py check` during
server restarts, and fa77be6e6c added one during upgrades; as such,
we expect that any check failures will already have been caught when
performing a restart or upgrade, and there is no point in running them
on process startup.

(cherry picked from commit 48b925931e)
2024-05-07 12:52:51 -07:00
Alex Vandiver
14d4900648 puppet: Split out a zulip::profile::standalone_nodb class. 2024-05-07 11:18:22 -07:00
Alex Vandiver
133c8249b1 puppet: Remove gettext, zulip::static_asset_compiler.
These came in via d0dcc8bf26, which looks like it copied the comment
from the provisioning code.  Production installs (even from git) do
not call `./manage.py makemessages`, so there is no reason to require
this for production deployments.
2024-05-07 11:18:22 -07:00
Alex Vandiver
eb202446f8 postgresql: Decouple wal_g from streaming replication.
Streaming replication may be used even if `wal-g` is not -- as long as
the user can move a copy of the base backup to the replica (e.g. using
`pg_basebackup`).  Remove the warning about this combination, and move
the `primary_conninfo` setting outside of the `s3_backups_bucket`
check.
2024-05-07 11:18:22 -07:00
Alex Vandiver
c5fe866297 postgresql: Drop lowered max_wal_senders value.
This was originally added in a PostgreSQL 9.5 configuration, where the
default is 0, meaning replication is disabled[^1].  In PostgreSQL 10
and above, the default is 10[^2], and there is no reason to lower it
from that value.

[^1]: https://www.postgresql.org/docs/9.5/runtime-config-replication.html
[^2]: https://www.postgresql.org/docs/10/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER
2024-05-07 11:18:22 -07:00
Alex Vandiver
2c7deef0ea puppet: Fix indentation. 2024-05-07 11:18:22 -07:00
Alex Vandiver
84d5e58e3a puppet: Switch from top-level fact variables to facts dict.
This is a cherry-pick of 57f8b48ff9 and 235e2eefc8.
2024-05-07 11:18:22 -07:00
Alex Vandiver
32dbcc3489 puppet: Only use the cron resource on Puppet 5.x.
Puppet 6.x and above moved the `cron` resource type out of core
Puppet, into a "cron_core" module[^1]; thus Debian 12 and Ubuntu
24.04, with the newer Puppet, fail to compile if backups are
configured:

```
Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'cron' (file: /home/zulip/deployments/current/puppet/zulip/manifests/postgresql_backups.pp, line: 32, column: 3) on node zulip.example
```

However, we only use the `cron` package to assert that we no longer
use it, for installs which are coming up from Zulip 4.x where it _was_
used.

Since Zulip 4.x does not support Ubuntu 24.04 or Debian 12, having
Puppet >= 6.x means that the cron must have been previously removed in
a previous upgrade, and we can skip the block.

In `main`, this block is removed entirely.

[^1]: https://forge.puppet.com/modules/puppetlabs/cron_core/readme
2024-04-02 11:54:29 -07:00
Alex Vandiver
791969de3e puppet: Do not include postgresql_common for all PostgreSQL clients.
It pulls in the server.

(cherry picked from commit 80ef38757a)
2024-03-28 12:31:46 -07:00
Alex Vandiver
96dfec1f1d puppet: Install the same version of postgres-client as the server.
We require a `pg_dump` whose version matches the version of the server
we are configured against (see 3a8b4b0205).  Installing the latest
`postgresql-client` does not guarantee that we have such a binary
present.

(cherry picked from commit 23504308fb)
2024-03-28 12:31:46 -07:00
Alex Vandiver
1aabbf1d8c wal-g: Add support for incremental backups.
This only defaults to on for local-disk backups, since they are more
disk-size-sensitive, and local accesses are quite cheap compared to
loading multiple incremental backups from S3.

(cherry picked from commit 323e1e92b7)
2024-03-19 10:41:20 -07:00
Alex Vandiver
ba061d448f puppet: Remove backups cron job if they are no longer enabled.
(cherry picked from commit e200f4d233)
2024-03-19 10:41:20 -07:00
Alex Vandiver
1b9be7c30c wal-g: Provide a to-local-disk backup option.
(cherry picked from commit b86d92e053)
2024-03-19 10:41:20 -07:00
Anders Kaseorg
9eaec853d1 ruff: Fix RUF027 Possible f-string without an f prefix.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit 553f268b04)
2024-03-07 15:07:12 -08:00
Anders Kaseorg
9ea290d5ae python: Reformat with Ruff formatter.
https://docs.astral.sh/ruff/formatter/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit 570f3dd447)
2024-03-07 15:07:12 -08:00
Tim Abbott
c4dfeb9c37 puppet: Increase minimum memory for multiprocess queue workers.
This should give some more room for systems that are still below 4GB
of RAM to use the lower-memory multithreaded mode, which is less
likely to have OOM kills (a very bad experience).

There should be little cost, as few systems are likely allocated with
memory in this range.

(cherry picked from commit a22f418827)
2024-02-16 12:28:16 -08:00
Tim Abbott
8ea5e2156a puppet: Update rules for number of uwsgi processes.
The defaults for how many uwsgi processes to run no longer depend on
the queue processor mode, but instead the total memory on the system.

(cherry picked from commit 62dbe2298e)
2024-02-16 12:28:16 -08:00
Alex Vandiver
495312b86a logrotate: smokescreen has its own config file.
149bea8309 added a separate config file
for smokescreen (which is necessary because it can be installed
separately) but failed ot notice that `zulip.template.erb` already had
a config line for it.  This leads to failures starting the logrotate
service:

```
logrotate[4158688]: error: zulip:1 duplicate log entry for /var/log/zulip/smokescreen.log
logrotate[4158688]: error: found error in file zulip, skipping
```

Remove the duplicate line.

(cherry picked from commit 725affcb5a)
2024-01-15 12:02:53 -08:00
Alex Vandiver
e311b372cb install: Support PostgreSQL 16.
(cherry picked from commit 1ba2f39854)
2024-01-05 10:32:54 -05:00
Anders Kaseorg
26811c5049 models: Extract zerver.models.clients.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-05 10:32:54 -05:00
Anders Kaseorg
086df4a81e models: Extract zerver.models.realms.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-05 10:32:54 -05:00
Anders Kaseorg
ee85ac5433 models: Extract zerver.models.users.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-05 10:32:54 -05:00
Alex Vandiver
4989221b9e nginx: Limit the methods that we proxy to Tornado.
While the Tornado server supports POST requests, those are only used
by internal endpoints.  We only support OPTIONS, GET, and DELETE
methods from clients, so filter everything else out at the nginx
level.

We set `Accepts` header on both `OPTIONS` requests and 405 responses,
and the CORS headers on `OPTIONS` requests.
2023-12-08 09:23:30 -08:00
Alex Vandiver
ca57d360e6 puppet: Update dependencies. 2023-12-07 18:45:10 -08:00
Anders Kaseorg
3853fa875a python: Consistently use from…import for urllib.parse.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-05 13:03:07 -08:00
Anders Kaseorg
8a7916f21a python: Consistently use from…import for datetime.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-05 12:01:18 -08:00
Alex Vandiver
9b1bdfefcd nagios: Use a better index on UserActivity for zephyr alerting.
Limiting only by client_name and query leads to a very poorly-indexed
lookup on `query` which throws out nearly all of its rows:

```
Nested Loop  (cost=50885.64..60522.96 rows=821 width=8)
  ->  Index Scan using zerver_client_name_key on zerver_client  (cost=0.28..2.49 rows=1 width=4)
        Index Cond: ((name)::text = 'zephyr_mirror'::text)
  ->  Bitmap Heap Scan on zerver_useractivity  (cost=50885.37..60429.95 rows=9052 width=12)
        Recheck Cond: ((client_id = zerver_client.id) AND ((query)::text = ANY ('{get_events,/api/v1/events}'::text[])))
        ->  BitmapAnd  (cost=50885.37..50885.37 rows=9052 width=0)
              ->  Bitmap Index Scan on zerver_useractivity_2bfe9d72  (cost=0.00..16631.82 rows=..large.. width=0)
                    Index Cond: (client_id = zerver_client.id)
              ->  Bitmap Index Scan on zerver_useractivity_1b1cc7f0  (cost=0.00..34103.95 rows=..large.. width=0)
                    Index Cond: ((query)::text = ANY ('{get_events,/api/v1/events}'::text[]))
```

A partial index on the client and query list is extremely effective
here in reducing PostgreSQL's workload; however, we cannot easily
write it as a migration, since it depends on the value of the ID of
the `zephyr_mirror` client.

Since this is only relevant for Zulip Cloud, we manually create the
index:

```sql
CREATE INDEX CONCURRENTLY zerver_useractivity_zehpyr_liveness
    ON zerver_useractivity(last_visit)
 WHERE client_id = 1005
   AND query IN ('get_events', '/api/v1/events');
```

We rewrite the query to do the time limit, distinct, and count in SQL,
instead of Python, and make use of this index.  This turns a 20-second
query into two 10ms queries.
2023-11-30 16:01:55 -08:00
Alex Vandiver
c4b619af15 puppet: Change /etc/rabbitmq to be owned by rabbitmq.
The Ubuntu and Debian package installation scripts for
`rabbitmq-server` install `/etc/rabbitmq` (and its contents) owned by
the `rabbitmq` user -- not `root` as Puppet does.  This means that
Puppet and `rabbitmq-server` unnecessarily fight over the ownership.

Create the `rabbitmq` user and group, to the same specifications that
the Debian package install scripts do, so that we can properly declare
the ownership of `/etc/rabbitmq`.
2023-11-29 21:45:35 -08:00
Alex Vandiver
c47ee4a296 zulip_ops: Configure stats to be pushed to status.zulip.com. 2023-11-16 16:21:12 -05:00
Alex Vandiver
5e49804004 puppet_ops: Include Akamai log parser on prometheus server. 2023-11-13 14:35:39 -05:00
Alex Vandiver
5591d6f65c zulip_ops: Add configuration for Vector Akamai stats.
Akamai writes access logs to S3; we use an SQS events queue, combined
with Vector, to transform those into Prometheus statistics.
2023-11-13 09:53:20 -08:00
Tim Abbott
b59e90d100 puppet: Fix buggy media-src Content-Security-Policy.
The colon is invalid syntax. Verified the updated policy using an
online CSP checker.
2023-11-06 14:45:05 -05:00
Alex Vandiver
803b7b4b93 puppet: Fix SHA256sum of sentry-cli binary. 2023-10-31 10:24:49 -07:00
Alex Vandiver
37b261ef0f puppet: Update dependencies. 2023-10-30 16:10:25 -07:00
Aman Agrawal
f3ab45a152 uploads-internal: Mark self as a valid source of loading media.
Without this, browser refused to play the video. To reproduce press `open`
on an uploaded video on CZO. Chrome gives us the following error
in console:

Refused to load media from '<source>' because it violates the
following Content Security Policy directive: "default-src 'none'".
Note that 'media-src' was not explicitly set, so 'default-src' is
used as a fallback.
2023-10-12 09:57:21 -07:00
Anders Kaseorg
835ee69c80 docs: Fix grammar errors found by mwic.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-09 13:24:09 -07:00
Anders Kaseorg
4cb2eded68 typos: Fix typos caught by typos.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-09 11:55:16 -07:00
Alex Vandiver
528d0ebcf0 puppet: Serve /etc/zulip/well-known/ in nginx as /.well-known/. 2023-10-04 15:56:42 -07:00
Aman Agrawal
8ef52d55d3 markdown: Add support for inline video thumbnails. 2023-10-02 22:39:02 -07:00
Alex Vandiver
5308fbdeac puppet: Add postgresql-client depenencies to monitoring.
The `unless` step errors out if /usr/bin/psql does not exist at
first evaluation time -- protect that with a `test -f` check, and
protect the actual `createuser` with a dependency on `postgresql-client`.
To work around `Zulip::Safepackage` not actually being safe to
instantiate more than once, we move the instantiation of
`Package[postgresql-client]` into a class which can be safely
included one or more times.
2023-09-22 11:45:00 -07:00
Alex Vandiver
5ee4b642ad views: Add a /health healthcheck endpoint.
This endpoint verifies that the services that Zulip needs to function
are running, and Django can talk to them.  It is designed to be used
as a readiness probe[^1] for Zulip, either by Kubernetes, or some other
reverse-proxy load-balancer in front of Zulip.  Because of this, it
limits access to only localhost and the IP addresses of configured
reverse proxies.

Tests are limited because we cannot stop running services (which would
impact other concurrent tests) and there would be extremely limited
utility to mocking the very specific methods we're calling to raising
the exceptions that we're looking for.

[^1]: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
2023-09-20 09:53:59 -07:00
Alex Vandiver
f778316b5a uwsgi: Ensure that the master process cannot load the application.
The rolling restart configuration of uwsgi attempted to re-chdir the
CWD to the new `/home/zulip/deployments/current` before `lazy-apps`
loaded the application in the forked child.  It successfully did so --
however, the "main" process was still running in the original
`/home/zulip/deployments/current`, which somehow (?) tainted the
search path of the children processes.

Set the parent uwsgi process to start in `/`, so that the old deploy
directory cannot taint the load order of later children processes.
2023-09-18 13:13:34 -07:00
Alex Vandiver
a6d5d7740e uwsgi: Always enable lazy-apps.
Enabling `lazy-apps` defers loading of the uwsgi application until
after the fork, instead of happening prior to forking workers[^1].  The
nominal reason to not enable this is that it increases the memory
footprint of the server (since no memory is shared across processes),
and may slow down worker initialization, since each worker has to load
the files from disk.

However, Django defers loading the majority of the code until the
first request[^2].  As such, our current non-`lazy-apps` gains nothing
over `lazy-apps`.  For consistency, switch to using `lazy-apps` for
all deployments, rolling restart or no.

[^1]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#preforking-vs-lazy-apps-vs-lazy
[^2]: https://uwsgi-docs.readthedocs.io/en/latest/articles/TheArtOfGracefulReloading.html#preforking-vs-lazy-apps-vs-lazy
2023-09-18 13:13:34 -07:00
Alex Vandiver
f95c8b894a nagios: Remove load monitoring.
Load monitoring alerts are extremely noisy, and do not reliably
indicate an issue which is affecting users.
2023-09-14 09:29:29 -07:00
Anders Kaseorg
2665a3ce2b python: Elide unnecessary list wrappers.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-09-13 12:41:23 -07:00
Alex Vandiver
135acfea93 nginx: Suppress proxy warnings when the proxy itself sent the request.
This is common in cases where the reverse proxy itself is making
health-check requests to the Zulip server; these requests have no
X-Forwarded-* headers, so would normally hit the error case of
"request through the proxy, but no X-Forwarded-Proto header".

Add an additional special-case for when the request's originating IP
address is resolved to be the reverse proxy itself; in these cases,
HTTP requests with no X-Forwarded-Proto are acceptable.
2023-09-12 10:10:58 -07:00
Alex Vandiver
ccbd834a86 postgres_exporter: Rebase the per-index stats branch.
The branch from the PR is somewhat stale, and is missing important bugfixes.
2023-09-11 17:59:54 -07:00
Alex Vandiver
0c88cfca63 postgres_exporter: Build from source for per-index stats.
This builds prometheus-community/postgres_exporter#843 to track
per-index statistics.
2023-09-11 11:59:39 -07:00
Alex Vandiver
fdd811bec1 postgres_exporter: Explicitly specify the zulip database.
Some of the collectors (e.g. `pg_stat_user_tables`) don't appear to
work with `--auto-discover-databases`, which is deprecated since
version 0.13.0[^1].

Explicitly set the database name.

[^1]: https://github.com/prometheus-community/postgres_exporter/releases/tag/v0.13.0
2023-09-06 09:20:57 -07:00