Using pg_ctlcluster leaves systemctl thinking the process aborted; and
not all instances (e.g. Docker) have systemctl.
(cherry picked from commit 7a8a8f5f23)
This code dates back to 57b52310639a; however, this has been handled
by `postgresql-common` adding a post-install trigger to call
`pg_updatedicts` for each new PostgreSQL version, since
`postgresql-common` version 153 (February 2014).
(cherry picked from commit 9def655564)
This uses the same technique used in 840884ec89, to only apply select
parts of the Puppet configuration. This is more correct, and simpler,
than attempting to chop out some base puppet roles, and hack around
the `purge => true` supervisor.d configuration.
(cherry picked from commit e13f82f048)
This provides access logging metrics to Prometheus. For cardinality
reasons, we cannot (nor would we want to) put every request path into
its own label value -- but we do separate out the most-frequent access
paths (as well as some low-frequency but high-interest ones) into
their own label values.
In order to differentiate accesses to https://zulip.com/ from
https://example.zulipchat.com/ (both of which appear at path `/`), we
use a `grok_exporter.realm_names_regex` value in `zulip.conf`, which
is expected to be set to match the hostname of all possible realms.
(cherry picked from commit 840fa74854)
Unilaterally adding the port can cause CSRF failures when the port is
a default port, and thus optional. Switch to providing the exact
`Host` header that the original request contained.
(cherry picked from commit 5f783ed5ad)
This prevents a thundering herd for videos -- if a very large video is
posted to a channel with many active clients, all of them
simultaneously request it, to provide the in-feed preview image.
While these requests come with a `Range` header which is intended to
limit the request to just the first couple MB, nginx ignores this
header when making its request to the upstream -- so it can obtain and
cache the whole file locally. This results in multiple competing
requests for the whole content from S3, all racing to store the
content in the cache.
Use cache slicing to split the content cache into chunks of 5MB; the
cache is filled one slice at a time, as needed based on the byte
ranges that clients request. Clients making requests without a
`Range` header are provided with the content transparently stitched
together from the individual slices.
The slice size of 5MB is chosen to encompass more 95% of file
uploads (saving an extra trip to the origin) while also being large
enough to be able to provide video thumbnails in a single slice, as
well as not take too much time to obtain from the upstream.
(cherry picked from commit 23e8eb5c7c)
Some deployments choose to wrap Zulip's nginx in an outer proxy -- for
example, to do custom TLS termination. In such deployments, the outer
proxy is routing to `127.0.0.1:80`; b4fb22ba1b breaks these
configurations, as it switches the `127.0.0.1:80` listener to only
serving `/api/internal/` paths.
Switch to serving the whole application over `127.0.0.1:80`.
(cherry picked from commit e2e0c72a80)
The `$host` nginx variable is _not_ the unadulterated `Host`
header (which would be `$http_host`) -- it is that header, *without
the port*, with a fallback to the `server_name` which processed the
request.
This means that backend services are not aware of the port that the
request came in on, unless they derive that from reading
`nginx_listen_port` in `/etc/zulip/zulip.conf`, or similar.
Specifically, this caused `tusd`, on deploys with non-standard
`nginx_listen_port`, to generate a `Location` header which left off
the port, and as such attempted a CORS check when retrieving metadata
about the just-uploaded file, which failed.
Add the port to the `Host` header we pass to `tusd` and other backend
services.
(cherry picked from commit 4e26705fbc)
This makes the `localhost.d` directory less of a lie, and decreases
the chances that local reconfigurations will break the 127.0.0.1:80
server which is used for IPC.
In cases where `nginx_http_only` is enabled, we respect
`nginx_listen_port` soas to not attempt to bind on port 80 if the
administrator was explicitly attempting to avoid that.
(cherry picked from commit b4fb22ba1b)
This generalizes from thumbnail_workers, to include any other queue.
We only additionally choose to document `email_senders_workers`,
however, since other queues are not guaranteed to work correctly with
multiple consumers.
This is already installed on a lot of systems, and is used indirectly
when upgrading Zulip from Git.
We previously removed this in
263212decf, I believe due to an
incorrect understanding of only makemessages needing it.
`deliver_scheduled_emails` tries to deliver the email synchronously,
and if it fails, it retries after 10 seconds. Since it does not track
retries, and always tries the earliest-scheduled-but-due message
first, the worker will not make forward progress if there is a
persistent failure with that message, and will retry indefinitely.
This can result in excessive network or email delivery charges from
the remote SMTP server.
Switch to delivering emails via a new queue worker. The
`deliver_scheduled_emails` job now serves only to pull deferred jobs
out of the table once they are due, insert them into RabbitMQ, and
then delete them. This limits the potential for head-of-queue
failures to failures inserting into RabbitMQ, which is more reasonable
than failures speaking to a complex external system we do not control.
Retries and any connections to the SMTP server are left to the
RabbitMQ consumer.
We build a new RabbitMQ queue, rather than use the existing
`email_senders` queue, because that queue is expected to be reasonably
low-latency, for things like missed message notifications. The
`send_future_email` codepath which inserts into ScheduledEmails is
also (ab)used to digest emails, which are extremely bursty in their
frequency -- and a large burst could significantly delay emails behind
it in the queue.
The new queue is explicitly only for messages which were not initiated
by user actions (e.g., invitation reminders, digests, new account
follow-ups) which are thus not latency-sensitive.
Fixes: #32463.
In some cases, it is not possible to configure the load-balancer to
add an X-Forwarded-Proto header. If Zulip is serving its traffic over
HTTP, it will rightly error out, since it cannot guarantee that its
response will be served over an encrypted connection.
Add a new `loadbalancer.rejects_http_requests` settings which serves
as a way for the operator to swear that the load-balancer will *never*
serve responses from Zulip over an unencrypted connection. In most
cases, this is because the load-balancer is configured to have port 80
always serve an HTTP 301 redirect to the same URL over HTTPS.
Properly configuring the proxy to send `X-Forwarded-Proto` is always a
better solution than using this configuration parameter, so use of
this should be viewed as a last resort.
Zulip instances without a database included, like the Docker image,
would not fail to use TLS properly, since `TLS_REQCERT` was not set in
`/etc/ldap/ldap.conf`. While there's a few other ways we could fix
this, just installing libldap-common on app frontend instances seems
like a good solution, and has no impact on other Zulip systems, and it
was already being installed through a "Recommends" tier apt dependency
indirectly from the PostgreSQL server package.
Fixeszulip/docker-zulip#454.
The nginx-to-uwsig-timeout defaults to 60s, which is exactly the same
as the current "harakiri" timeout configured in uwsgi (which limits
the length a request can run before the worker is terminated). This
causes a race, where if nginx hits its 60s before uwsgi, then we
return a 504; otherwise, we get a 502.
Make the nginx-to-uwsgi timeout explicit, and shorten the "harakiri"
timeout to be explicitly less than that. Document the 60s timeout,
which all outer reverse proxies must be set to _longer than_ in order
to have proper "onion" timeouts.
Currently, it handles two hook types: 'pre-create' (to verify that the
user is authenticated and the file size is within the limit) and
'pre-finish' (which creates an attachment row).
No secret is shared between Django and tusd for authentication of the
hooks endpoints, because none is necessary -- tusd forwards the
end-user's credentials, and the hook checks them like it would any
end-user request. An end-user gaining access to the endpoint would be
able to do no more harm than via tusd or the normal file upload API.
Regardless, the previous commit has restricted access to the endpoint
at the nginx layer.
Co-authored-by: Brijmohan Siyag <brijsiyag@gmail.com>