zulip

mirror of https://github.com/zulip/zulip.git synced 2025-10-23 04:52:12 +00:00

Author	SHA1	Message	Date
Alex Vandiver	662c946635	puppet: Stop using an unnecessary concat. This was added in `6975417acf`, to support `zmirror` deployments, which are no longer necessary.	2025-08-29 09:45:28 -07:00
Alex Vandiver	72ea340f80	puppet: Skip client reloads during weekly restarts. Weekly restarts do not deploy any new client code, so there is no reason to push clients to reload.	2025-08-19 23:39:38 -07:00
Alex Vandiver	82b40ac6e2	puppet: Add an exporter to fetch Weblate stats, for graphing. This is done synchronously, despite taking ~60s. We can move it to a background thread later if that's an issue, but generally Prometheus is tolerant to exporters taking a while to come back with results.	2025-08-14 13:18:56 -07:00
Alex Vandiver	2909d4917c	send_zulip_update_announcements: Log updates in the post-deploy hook.	2025-08-06 09:52:45 -07:00
Alex Vandiver	a5a5791794	management: Skip hourly crons during deploys. This is most important for `send_zulip_update_announcements`, which can race with the version run as a post-deploy hook. However, all of these crons can tolerate being slightly delayed, and there's little benefit to them taking CPU or possibly hitting odd borderline race conditions when the deploy is in progress. For safety, we only trust the deploy lockfile if it was created within the last hour -- deploys should not take more than an hour, and failing to ever run hourly crons is much worse than perhaps running them during a real very-long deploy.	2025-07-15 09:07:36 -07:00
Alex Vandiver	43a7035745	docker: Start local mailserver on port 25.	2025-07-02 10:21:08 -07:00
Anders Kaseorg	d64ab7abf7	process_fts_update: Fix S608 Possible SQL injection vector. Although this code was not actually vulnerable as written, we never want to be disabling this Ruff rule, in order to discourage later introduction of vulnerabilities. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2025-06-29 23:14:11 -07:00
Alex Vandiver	4f67c1dbf1	wal-g-exporter: Handle empty backup list, for new buckets.	2025-06-18 16:47:22 -07:00
Alex Vandiver	2b1e8c28b4	nagios: Make queue warnings message CZO.	2025-06-18 12:29:57 -07:00
Alex Vandiver	483ced0b15	puppet: Upgrade dependencies.	2025-06-18 11:44:32 -07:00
Alex Vandiver	dfadc9fa39	puppet: Remove zephyr configuration and hosts.	2025-06-13 16:54:26 -07:00
Tim Abbott	0ec07fe4c8	queue: Allow sharding user_activity worker. This follows the existing patterns for the sharded mobile notifications worker.	2025-06-06 10:33:20 -07:00
Alex Vandiver	1d72e38e62	process_exporter: Monitor the standalone email_server.	2025-05-20 07:03:23 -07:00
Alex Vandiver	c6e0f0b436	email-mirror: Remove HTTP interface.	2025-05-19 16:39:44 -07:00
Alex Vandiver	a6a5fc246a	kandra: Swap in zulip::local_mailserver for postfix.	2025-05-19 16:39:44 -07:00
Alex Vandiver	1f0cfd4662	email-mirror: Add a standalone server that processes incoming email. Using postfix to handle the incoming email gateway complicates things a great deal: - It cannot verify that incoming email addresses exist in Zulip before accepting them; it thus accepts mail at the `RCPT TO` stage which it cannot handle, and thus must reject after the `DATA`. - It is built to handle both incoming and outgoing email, which results in subtle errors (`1c17583ad5`, `79931051bd`, `a53092687e`, #18600). - Rate-limiting happens much too late to avoid denial of service (#12501). - Mis-configurations of the HTTP endpoint can break incoming mail (#18105). Provide a replacement SMTP server which accepts incoming email on port 25, verifies that Zulip can accept the address, and that no rate-limits are being broken, and then adds it directly to the relevant queue. Removes an incorrect comment which implied that missed-message addresses were only usable once. We leave rate-limiting to only channel email addresses, since missed-message addresses are unlikely to be placed into automated systems, as channel email addresses are. Also simplifies #7814 somewhat.	2025-05-19 16:39:44 -07:00
Alex Vandiver	f06253a62b	puppet: Fix comment to point to the updated docs. This was moved in `44c4b93f6e`.	2025-05-19 12:58:51 -07:00
Alex Vandiver	cc665cb9b7	process_fts_updates: Update comments for TODO being actually done.	2025-05-19 12:58:51 -07:00
Alex Vandiver	09f1195fb5	process_fts_updates: Use 3 queries, instead of BATCH_SIZE + 2. This provides at least a 10x speedup. We could use a CTE to fetch from, and lock, `fts_update_log` at the same time as performing the `zerver_message` update -- however, this would mean that the number of returned rows would be less than BATCH_SIZE if some rows in `zerver_message` no longer existed, causing premature termination of the outer loop. We lock the `zerver_message` rows in ascending ID order, to prevent deadlocks. We use `ctid` values[^1] to transfer the rows from the inner query to the outer update; this is safe, as the rows are locked from updates. [^1]: https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-CTID	2025-05-19 12:58:51 -07:00
Alex Vandiver	f2f94d30fe	process_fts_updates: Only DELETE if rows were processed.	2025-05-19 12:58:51 -07:00
Alex Vandiver	eeb485f894	process_fts_updates: After getting notifies, process them all. Every insert into `fts_update_log` triggers a `NOTIFY`; processing a batch of 1000 for every NOTIFY that we get thus results in a lot of extra processing of no rows. Instead, we clear all notifies, and repeat the pattern of processing rows until the batch comes up short. This may still result in extra wake-ups, in the event that a new NOTIFY occurs after we clear them, and that new row is processed by us in `update_all_rows`. However, these wake-ups are necessary for correctness, and this change will result in drastically fewer extra checks.	2025-05-19 12:58:51 -07:00
Alex Vandiver	fd898dafae	process_fts_updates: Use row locking and transactions.	2025-05-19 12:58:51 -07:00
Alex Vandiver	7a8a8f5f23	puppet: Use Service for PostgreSQL restarts. Using pg_ctlcluster leaves systemctl thinking the process aborted; and not all instances (e.g. Docker) have systemctl.	2025-05-16 11:33:20 -07:00
Alex Vandiver	9def655564	puppet: Do not bother manually symlinking hunspell dictionaries. This code dates back to 57b52310639a; however, this has been handled by `postgresql-common` adding a post-install trigger to call `pg_updatedicts` for each new PostgreSQL version, since `postgresql-common` version 153 (February 2014).	2025-05-16 11:33:20 -07:00
Alex Vandiver	e13f82f048	upgrade-postgresql: Use tags to partially-apply configuration. This uses the same technique used in `840884ec89`, to only apply select parts of the Puppet configuration. This is more correct, and simpler, than attempting to chop out some base puppet roles, and hack around the `purge => true` supervisor.d configuration.	2025-05-16 11:33:20 -07:00
Alex Vandiver	47238ed74c	kandra: Remove PostgreSQL upgrade key.	2025-05-14 09:16:39 -07:00
Alex Vandiver	1c8cfff0bd	postgres: Add knobs for min_wal_size / max_wal_size.	2025-05-12 11:12:45 -07:00
Alex Vandiver	b4c8d2e6f5	postgres: Add knobs for parallelism control.	2025-05-12 11:12:45 -07:00
Alex Vandiver	933871e825	postgres: Double maintenance_work_mem to 1/8th, but cap at 2GiB. This is based on annotated.conf[^1]. [^1]: `c4ebbf1420/postgresql.10.simple.conf (L122-L133)`	2025-05-12 11:12:45 -07:00
Alex Vandiver	8850c3f85c	postgres: Tune effective_cache_size upwards, to 75%. The previous 10/16ths is an odd value; various suggestions are around 75% of RAM, and in practice this is closer to what is observed.	2025-05-12 11:12:45 -07:00
Alex Vandiver	49c307e136	postgres: Remove explicit wal_buffers setting. The value, if unset, defaults to[^1]: > a size equal to 1/32nd (about 3%) of shared_buffers, but not less > than 64kB nor more than the size of one WAL segment, typically 16MB Letting it choose this, by default, but allowing an override via `zulip.conf`, seems preferable to hard-coding an arbitrary value. [^1]: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-BUFFERS	2025-05-12 11:12:45 -07:00
Alex Vandiver	45b6c5931a	puppet: Remove unnecessary use of `scope[...]`.	2025-05-12 11:12:45 -07:00
Alex Vandiver	a4ddc0a2d5	postgres: Remove lower-than-suggested checkpoint_completion_target. This defaults to 0.9, and lowering it is not generally suggested[^1]. [^1]: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET	2025-05-12 11:12:45 -07:00
Alex Vandiver	f47d150fb5	puppet: Allow overrides of computed PostgreSQL config settings. The four precomputed memory values gain overrides in `zulip.conf` with the same names, for more precise manual tuning.	2025-05-12 11:12:45 -07:00
Alex Vandiver	5bc75a94be	puppet: Base PostgreSQL tuning on if the frontend is on the same host. The previous thresholds were set based on the assumption that 50% of the memory on the host was taken up by Django and friends, leaving only 50% for the database host. On separate database hosts, this leads to a too-small default value for shared_buffers, which is generally suggested to be 25% of your database-dedicated memory. Check if there is a `zulip::app_frontend_base` as part of the deploy, and use that to adjust if we halve `$zulip::common::total_memory_mb` or not for purposes of calculating PostgreSQL memory settings. We also provide a `postgresql.memory` override, if necessary.	2025-05-12 11:12:45 -07:00
Alex Vandiver	1b9a8dbc52	kandra: Allow PostgreSQL primary ssh access for PostgreSQL upgrade. This allows scripting of the whole upgrade process; these grants are temporary.	2025-05-09 10:05:19 -07:00
Alex Vandiver	a959f71208	kandra: Support installing multiple ssh keys from multiple secrets.	2025-05-09 10:05:19 -07:00
Alex Vandiver	614fdee5ca	kandra: Add a pattern grouping for /api/v1/tus/. We already had one for `/api/v1/tus/...`, but not the bare `/api/v1/tus/` which is used to initiate the upload.	2025-05-05 12:13:16 -07:00
Alex Vandiver	21eff33875	puppet: Upgrade tusd to 2.8.0.	2025-04-11 12:13:07 -07:00
Alex Vandiver	309876796e	kandra: Add a couple useful command-line tools.	2025-04-11 11:25:58 -07:00
Alex Vandiver	b11cbbab01	smokescreen: Move metrics port from the default 9810, to 4760. This prevents errors if Smokescreen is running on a host with more than 10 Tornado shards.	2025-04-11 10:29:26 -07:00
Alex Vandiver	5f783ed5ad	nginx: Relay the same Host: header that nginx saw. Unilaterally adding the port can cause CSRF failures when the port is a default port, and thus optional. Switch to providing the exact `Host` header that the original request contained.	2025-04-09 10:44:12 -07:00
Anders Kaseorg	80b607c8cb	install: Remove PostgreSQL 13 support. PostgreSQL 13 reaches end of life on November 13, 2025, and Django 5.2 does not support it. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2025-04-07 17:41:55 -07:00
Anders Kaseorg	818742c62b	install: Support PostgreSQL 17. Signed-off-by: Anders Kaseorg <anders@zulip.com>	2025-04-07 16:42:19 -07:00
Alex Vandiver	e2e0c72a80	nginx: Serve full app from localhost. Some deployments choose to wrap Zulip's nginx in an outer proxy -- for example, to do custom TLS termination. In such deployments, the outer proxy is routing to `127.0.0.1:80`; `b4fb22ba1b` breaks these configurations, as it switches the `127.0.0.1:80` listener to only serving `/api/internal/` paths. Switch to serving the whole application over `127.0.0.1:80`.	2025-04-04 10:58:09 -07:00
Alex Vandiver	d2772a0a38	postgresql: Remove inaccurate comment. PostgreSQL 10 and later can adjust their TLS configuration without a restart, with `SELECT pg_reload_conf()` or `pg_ctlcluster 16 main reload`.	2025-04-02 13:07:31 -07:00
Alex Vandiver	9073ad3cc5	kandra: Add missing variable used in the template.	2025-04-02 09:55:21 -07:00
Alex Vandiver	258003a1c8	kandra: Extract the Teleport host CA to disk.	2025-04-02 09:55:21 -07:00
Alex Vandiver	23e8eb5c7c	nginx: Use cache slicing to prevent thundering herds for video thumbs. This prevents a thundering herd for videos -- if a very large video is posted to a channel with many active clients, all of them simultaneously request it, to provide the in-feed preview image. While these requests come with a `Range` header which is intended to limit the request to just the first couple MB, nginx ignores this header when making its request to the upstream -- so it can obtain and cache the whole file locally. This results in multiple competing requests for the whole content from S3, all racing to store the content in the cache. Use cache slicing to split the content cache into chunks of 5MB; the cache is filled one slice at a time, as needed based on the byte ranges that clients request. Clients making requests without a `Range` header are provided with the content transparently stitched together from the individual slices. The slice size of 5MB is chosen to encompass more 95% of file uploads (saving an extra trip to the origin) while also being large enough to be able to provide video thumbnails in a single slice, as well as not take too much time to obtain from the upstream.	2025-04-01 13:35:28 -07:00
Alex Vandiver	4e26705fbc	nginx: Tell the backend service what port we listen on. The `$host` nginx variable is _not_ the unadulterated `Host` header (which would be `$http_host`) -- it is that header, without the port, with a fallback to the `server_name` which processed the request. This means that backend services are not aware of the port that the request came in on, unless they derive that from reading `nginx_listen_port` in `/etc/zulip/zulip.conf`, or similar. Specifically, this caused `tusd`, on deploys with non-standard `nginx_listen_port`, to generate a `Location` header which left off the port, and as such attempted a CORS check when retrieving metadata about the just-uploaded file, which failed. Add the port to the `Host` header we pass to `tusd` and other backend services.	2025-04-01 09:28:18 -07:00

1 2 3 4 5 ...

1838 Commits