Commit Graph

1531 Commits

Author SHA1 Message Date
Anders Kaseorg
1a82ce38af install: Move ourself to deployments path before creating venv.
This prevents the venv from ending up with references to /root.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit c517e95e6b)
2025-03-26 10:55:07 -07:00
Anders Kaseorg
ce81d8498d provision: Ignore Python warnings while building requirements.
Build warnings are unfortunately very common in third-party packages.
They’re difficult to reliably detect since packages don’t always build
from source, and they can’t be whitelisted on a per-package basis
since they’re all attributed to setuptools or an anonymous code
string.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-03-19 17:15:09 -07:00
Anders Kaseorg
838ae38b43 install-uv: Upgrade uv from 0.6.3 to 0.6.6.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-03-14 17:31:50 -07:00
Alex Vandiver
29a0d287fc puppet: Allow for arbitrary queues to have more than one worker.
This generalizes from thumbnail_workers, to include any other queue.
We only additionally choose to document `email_senders_workers`,
however, since other queues are not guaranteed to work correctly with
multiple consumers.
2025-03-14 14:07:09 -07:00
Alex Vandiver
232de4b98f check_rabbitmq_queue: Increase deferred_email_senders paging thresholds.
c5200e8b05 switched `digest_emails` from sending emails by inserting
into the ScheduledEmail table, and being processed later by
`deliver_scheduled_emails`, to inserting into the
`deferred_email_senders` RabbitMQ queue.  This moved it from being in
an unmonitored table, to a monitored queue.

This slightly improved throughput -- but began paging, since the
backlog was now in a monitored form.  Increase the paging thresholds
to not page for expected behaviour.
2025-03-11 12:34:11 -07:00
Alex Vandiver
a9337e7641 nagios: Change the cron jobs to exit 0 for all ok/warning/critical.
The cron jobs are potentially wrapped by Sentry, which logs "cron
failures" and sends emails.  We would like those failures to only be
when the cron job itself failed to run successfully -- not when the
underlying metric is outside of its normal range.  We would like to
differentiate a failure of the monitoring infrastructure from a
failure of what it is monitoring.

Swap to return 0 on everything except "unknown" results.
2025-03-05 09:49:36 -08:00
Alex Vandiver
c5200e8b05 deliver_scheduled_emails: Use a queue, instead of infinite retries.
`deliver_scheduled_emails` tries to deliver the email synchronously,
and if it fails, it retries after 10 seconds.  Since it does not track
retries, and always tries the earliest-scheduled-but-due message
first, the worker will not make forward progress if there is a
persistent failure with that message, and will retry indefinitely.
This can result in excessive network or email delivery charges from
the remote SMTP server.

Switch to delivering emails via a new queue worker.  The
`deliver_scheduled_emails` job now serves only to pull deferred jobs
out of the table once they are due, insert them into RabbitMQ, and
then delete them.  This limits the potential for head-of-queue
failures to failures inserting into RabbitMQ, which is more reasonable
than failures speaking to a complex external system we do not control.
Retries and any connections to the SMTP server are left to the
RabbitMQ consumer.

We build a new RabbitMQ queue, rather than use the existing
`email_senders` queue, because that queue is expected to be reasonably
low-latency, for things like missed message notifications.  The
`send_future_email` codepath which inserts into ScheduledEmails is
also (ab)used to digest emails, which are extremely bursty in their
frequency -- and a large burst could significantly delay emails behind
it in the queue.

The new queue is explicitly only for messages which were not initiated
by user actions (e.g., invitation reminders, digests, new account
follow-ups) which are thus not latency-sensitive.

Fixes: #32463.
2025-03-04 16:09:25 -08:00
Alex Vandiver
47e622f5a5 run_hooks: Pass down, and respect, --from-git argument.
The refactoring in 4e28e1d3ff incorrectly switched a check for
`if args.from_git` into `if NEW_ZULIP_MERGE_BASE`, which is
incorrect -- the merge-base is always defined, it may just match the
version.  This led to errors when installing from tarball, without a
git repo.

Since the run_hooks command was already set up to take a `--from-git`
argument, but was ignoring it, pass down that flag from
upgrade-zulip-stage-3 when necessary, and swap the run_hooks logic
back to basing the version-resolution logic on that flag.
2025-03-04 13:18:50 -08:00
Anders Kaseorg
3af4900891 install-node: Upgrade Node.js from 22.12.0 to 22.14.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-26 16:20:47 -08:00
Anders Kaseorg
d7556b4060 requirements: Migrate to uv.
https://docs.astral.sh/uv/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-24 22:29:24 -08:00
Anders Kaseorg
72f5df2e09 install: Remove --cacert and CUSTOM_CA_CERTIFICATES.
This has been broken for many years and nobody’s complained.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-24 22:29:24 -08:00
Alex Vandiver
6ac9e3328e cache: Flush caches from all known key prefixes.
When flushing caches, we want to ensure that even processes which may
have a wrong cache-key-prefix know to fetch the latest data from the
database.  This is complicated by the cache-key-prefixes being stored
on disk, and thus checking that every cache delete is not sufficiently
performant.

We store the list of cache-key-prefixes in the cache, itself, with no
prefix.  This cache is updated when a new cache-key is written, and is
also allowed to lapse after 24 hours.  Updating this global cache
entry on new prefix creation ensures that even a
not-yet-restarted-into deployment will have its caches appropriately
purged if changes are made to the underlying data.

However, this both adds a cache-get, as well as multiplies the size of
all cache clears; for large bulk clears (e.g. for stream renames,
which clear the cache for all message-ids in them) this may prove
untenable.
2025-02-21 14:11:08 -08:00
Alex Vandiver
e2df4f52ef kandra: Update Teleport version. 2025-02-21 10:16:33 -08:00
Anders Kaseorg
3823697e6c clean_node_cache: Remove.
The old /srv/zulip-npm-cache system has been unused for two
years (Zulip Server ≥ 7.0).  We can just delete this directory.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-19 16:44:02 -08:00
Alex Vandiver
72f667fb31 upgrade-zulip: Prevent restarting only Django into inconsistent caching. 2025-02-14 12:03:13 -08:00
Mateusz Mandera
367d193639 register_server: Rename flag to --agree-to-terms-of-service.
That's a better style than the underscores.
2025-02-13 11:03:44 -08:00
Mateusz Mandera
0a45a289f2 install: Add --agree-to-terms-of-service flag. 2025-02-13 09:49:28 -08:00
Mateusz Mandera
ac0af8d5ff install: Replace RTD links with our new permalinks. 2025-02-12 09:50:52 -08:00
Mateusz Mandera
0de58860dc install: Configure services during installation. 2025-02-05 17:47:28 -08:00
Alex Vandiver
59dde93a47 run_hooks: Diff to "last" instead of "current" during post-deploy. 2025-02-04 10:44:17 -08:00
Anders Kaseorg
7c2212caa3 node_cache: Work around corepack breakage due to npm key rotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-02-03 11:29:39 -08:00
PieterCK
68b3ce482a check-database-compatibility: Refactor a STALE_MIGRATIONS.
This consolidates the list of stale migration to
`lib/migration_status.py` as `STALE_MIGRATIONS`.

This is a prep work to make the migration status tool at
`migration_status.py` be able to clean its output of these migrations
too.
2025-01-24 17:08:37 -08:00
Anders Kaseorg
653b0b0436 ruff: Partially reformat Python with Ruff 0.9 (2025 style).
These are the changes that are backwards compatible with the 2024
style.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-01-14 09:42:16 -08:00
Alex Vandiver
dfb4ef892c upgrade-zulip: Pass --only-django through to restart-server. 2025-01-03 15:15:28 -08:00
Anders Kaseorg
30b68abf55 install-node: Upgrade Node.js from 22.11.0 to 22.12.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-01-02 17:30:40 -08:00
Alex Vandiver
3eb501fa18 requirements: Require some swap space if installed on < 5GB RAM. 2024-12-22 06:52:06 -08:00
Anders Kaseorg
19b8cde27f ruff: Fix PLC0206 Extracting value from dictionary without calling .items().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-21 21:06:53 -08:00
Alex Vandiver
d8ed381789 puppet: Increase email_senders page threshold.
Email senders having a few minutes of backlog does not justify paging.

In the future, this worker will be async or sharded and not backlog.
2024-11-22 14:31:30 -08:00
Anders Kaseorg
532aee926c requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-19 11:09:34 -08:00
Anders Kaseorg
0bd9f4f674 install-node: Upgrade Node.js from 20.18.0 to 22.11.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-05 17:01:09 -08:00
Anders Kaseorg
2bb87aebec install: Remove PostgreSQL 12 support.
PostgreSQL 12 reaches end of life on November 14, 2024.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-01 15:33:31 -07:00
Anders Kaseorg
8dd0d7f48d reindex-textual-data: Remove PostgreSQL ≥ 11 check.
We removed PostgreSQL 10 support long ago in 6.0-beta1~88.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-11-01 15:33:31 -07:00
Anders Kaseorg
3ec58fd3d5 requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-10-20 18:16:27 -07:00
Anders Kaseorg
71ca928ec9 ruff: Fix SIM115 Use a context manager for opening files.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-10-20 18:16:27 -07:00
Anders Kaseorg
f812fee7ac install-node: Upgrade Node.js from 20.17.0 to 20.18.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-10-15 10:36:01 -07:00
Alex Vandiver
44fde64c42 restart-server: Add a --only-django for rolling Django restarts. 2024-10-08 12:53:21 -07:00
Alex Vandiver
818c30372f upload: Use tusd for resumable, larger uploads.
Currently, it handles two hook types: 'pre-create' (to verify that the
user is authenticated and the file size is within the limit) and
'pre-finish' (which creates an attachment row).

No secret is shared between Django and tusd for authentication of the
hooks endpoints, because none is necessary -- tusd forwards the
end-user's credentials, and the hook checks them like it would any
end-user request.  An end-user gaining access to the endpoint would be
able to do no more harm than via tusd or the normal file upload API.

Regardless, the previous commit has restricted access to the endpoint
at the nginx layer.

Co-authored-by: Brijmohan Siyag <brijsiyag@gmail.com>
2024-09-19 11:37:29 -07:00
Alex Vandiver
758aa36cbe stop-server: Stop katex, if running. 2024-09-17 12:51:30 -07:00
Alex Vandiver
2eb5bcbcc9 install: Update postgresql.version comment to be more accurate. 2024-09-11 09:54:43 -07:00
Alex Vandiver
22300373d3 install: We need postgresql.version with zulip::postgresql_client too.
5308fbdeac split out `zulip::postgresql_client`, and 80ef38757a
made it no longer depend on `zulip::postgresql_common`, but directly
on `zulipconf('postgresql', 'version', undef)`.  However, the
installer depended on recognizing `zulip::postgresql_common` in the
list of pulled-in classes to know that we needed to keep the
`postgresql.version` setting in `/etc/zulip.conf`.

Update the installer to also recognize `zulip::postgresql_client` as a
class which tells us to keep `postgresql.version` in our settings.
2024-09-11 09:54:43 -07:00
Alex Vandiver
38053e9c7c emoji: Convert spritesheets to webp.
This provides significant size savings:

| Emoji set   | png size | webp size | webp/png percent |
| ----------- | -------- | --------- | ---------------- |
| google-blob |  1968954 |   1373350 |           69.75% |
| twitter     |  2972820 |   2149672 |           72.31% |
| google      |  3455270 |   2327834 |           67.37% |

Since these are the largest assets that we ship to clients, it is
worth shaving off every byte we can.
2024-09-11 09:03:47 -07:00
Alex Vandiver
5eb38f24ec setup_venv: Fix name of libvips in yum.
RHEL provisioning likely doesn't currently work, but no reason to not do slightly better.
2024-09-11 09:03:47 -07:00
Anders Kaseorg
91ade25ba3 python: Simplify with str.removeprefix, str.removesuffix.
These are available in Python ≥ 3.9.
https://docs.python.org/3/library/stdtypes.html#str.removeprefix

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-03 12:30:16 -07:00
Anders Kaseorg
8b147d92a8 apt-repos: Use PGroonga binaries on Ubuntu 24.04.
These did not exist when we first added Ubuntu 24.04 support; now they
do.  Fixes #31261.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-08-29 16:04:44 -07:00
Alex Vandiver
6c1a8185aa setup_path: Ensure that the right venv is activated.
`setup_path()` previously only checked that some `zulip-py3-venv` was
the `sys.prefix`, not that it was the one associated with this
deployment.  When `uwsgi` is started, it is started from `bin/uwsgi`
within a `zulip-py3-venv` virtualenv, and as such sets
`sys.executable` to that, resulting in uwsgi workers picking up the
library path of that virtualenv.  On first start, `sys.path` thus
already matches the expected virtualenv, and the `setup_path` in
`zproject.wsgi` does nothing.

If a rolling restart was later done into a deployment with a different
virtualenv, the `zproject.wsgi` call to `setup_path()` did not change
`sys.path` to the new virtualenv, since it was already running within
_a_ virtualenv.  This led to dependency version mismatches, and
potentially even more disastrous consequences if the old (but still
erroneously in use) virtualenv was later garbage-collected.

PR #26771 was a previous attempt to resolve this, but failed due to
not thinking of the uwsgi binary itself as possibly providing a
virtualenv path.  We leave the `chdir` hooks from that in-place, since
it cannot hurt for the "master" uwsgi process to be chdir'd to `/`,
and the `hook-post-fork` `chdir` is reasonable as well.

Resolve the virtualenv in `setup_path()`, and activate it if it
differs from the one that is currently active.  To be sure that no
other old virtualenvs are used, we also filter out any paths which
appear to be from other Zulip virtualenvs.
2024-08-29 12:42:13 -07:00
Alex Vandiver
674ca1a95d restart-server: Wait until chain reload has completed.
We should not proceed and send client reload events until we know that
all of the server processes have updated to the latest version, or
they may reload into the old server version if they hit a Django
worker which has not yet restarted.

Because the logic controlling the number of workers is mildly complex,
and lives in Puppet, use the `uwsgi` Python bindings to know when the
process being reloaded is the last one, and use that to write out a
file signifying the success of the chain reload.  `restart-server`
awaits the creation of this file before proceeding.
2024-08-29 12:12:34 -07:00
Alex Vandiver
7424d4e721 install: Do not pull in "recommended" packages.
This likely has no effect in this instance, but is good practice.
2024-08-26 20:52:59 -07:00
Alex Vandiver
e8f7e281a7 create-production-venv: Do not pull in "recommended" packages.
This prevents, among other things, `libvips` from pulling in the
`firefox` package by way of the `nip2` package.

Fixes: #31411.
2024-08-26 20:52:59 -07:00
Anders Kaseorg
2f8734aee0 install-node: Upgrade Node.js from 20.15.1 to 20.17.0.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-08-26 14:17:05 -07:00
Alex Vandiver
a5a898ba21 install-aws-server: Switch to 24.04. 2024-08-20 10:04:37 -07:00