Commit Graph

42568 Commits

Author SHA1 Message Date
Alex Vandiver
90b6fe2c6e upgrade: Show output from (re)starting zulip.
5c450afd2d, in ancient history, switched from `check_call` to
`check_output` and throwing away its result.

Use check_call, so that we show the steps to (re)starting the server.
2022-01-25 01:34:20 +00:00
Alex Vandiver
36cebad4c0 CVE-2021-43799: During upgrades, restart rabbitmq if necessary.
Check if it is listening on a public interface on port 25672, and if
so shut it down so it can pick up the new configuration.
2022-01-25 01:34:20 +00:00
Alex Vandiver
f33fbb527c upgrade: Make calling shutdown_server twice, only try once. 2022-01-25 01:34:20 +00:00
Alex Vandiver
134a8d4301 CVE-2021-43799: Write rabbitmq configuration before starting.
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.

The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.

The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.

Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost.  This does not mitigate existing installs, since
it does not force a rabbitmq restart.

[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
2022-01-25 01:34:17 +00:00
Alex Vandiver
a07f64a463 puppet: Always set the RabbitMQ nodename to zulip@localhost.
This is required in order to lock down the RabbitMQ port to only
listen on localhost.  If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.

Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.

The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed.  However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
2022-01-25 01:33:27 +00:00
Alex Vandiver
e9af26df6e puppet: Run configure-rabbitmq on nodename change.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.

Trigger this in puppet by running configure-rabbitmq after the file
changes.
2022-01-24 23:09:02 +00:00
Alex Vandiver
7f6b423532 setup: Remove unused RABBITMQ_NODE.
This reverts commit 889547ff5e.  It is
unused in the Docker container, as the configurtaion of the `zulip`
user in the rabbitmq node is done via environment variables.  The
Zulip host in that context does not have `rabbitmqctl` installed, and
would have needed to know the Erlang cookie to be able to run these
commands.
2022-01-24 23:09:02 +00:00
Alex Vandiver
d95fb34ba7 puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-24 23:09:02 +00:00
Alex Vandiver
5ff759d35c puppet: Remove rabbitmq_mochiweb configuration.
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled.  Nor does this control the management interface, which
would listen on port 15672.
2022-01-24 23:09:02 +00:00
Alex Vandiver
a0d1074212 ci: Cache with the OS name, not the job name.
The job name is just the constant `production_build`.  Renaming it to
have the OS in the key ensures that it is not shared across OS'es (for
instance between `4.x` and `main`, which are now bionic and buster,
respectively), and also allows it to share caches with the install
step, which uses the OS name in that place.
2022-01-24 15:07:50 -08:00
Alex Vandiver
2e1e2b08f1 puppet: Fix standalone certbot configurations.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.

This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration.  Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.

Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information.  We use `grep`, on the assumption that this will catch
nearly all cases.

It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.

Fixes #20593.

(cherry picked from commit a3adaf4aa3)
2022-01-24 20:14:23 +00:00
Alex Vandiver
b44a1b68f6 setup: Install a temporary certificate, before certbot runs.
Installing certbot with --method=standalone means that the
configuration file will be written to assume that the standalone
method will be used going forward.  Since nginx will be running,
attempts to renew the certificate will fail.

Install a temporary self-signed certificate, just to allow nginx to
start, and then follow up (after applying puppet to start nginx) with
the call to setup-certbot, which will use the webroot authenticator.

The `setup-certbot --method=standalone` option is left intact, for use
in development environments.

Fixes part of #20593; it does not address installs which were
previously improperly configured with `authenticator = standalone`.

(cherry picked from commit 76ce8631c0)
2022-01-24 20:14:23 +00:00
Alex Vandiver
c3adbcea13 docs: Mention Camo does not use a local Smokescreen in the proxies docs.
This documents the new behaviour in d328d3dd4d.

(cherry picked from commit be1c4c2bd8)
2022-01-21 16:21:15 -08:00
Alex Vandiver
e088b343b3 puppet: Document that upgrades from Git require 3GB.
The step of rebuilding static assets using webpack requires more than
2G of RAM.

(cherry picked from commit 5f237cb34e)
2022-01-19 12:37:55 -08:00
Alex Vandiver
1d559bbffa puppet: Allow routing camo requests through an outgoing proxy.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary.  However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.

Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be.  Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.

Fixes: #20550.
(cherry picked from commit d328d3dd4d)
2022-01-11 15:13:09 -08:00
Alex Vandiver
cb24f93bba puppet: Make zulipconf() understand coercion to bool.
If the default is a bool, coerce the value into a bool as well.  For
backwards compatibility, this does not adjust any existing callsites.

`queue_workers_multiprocess` is the only setting which is passed a
bool default, but it was already documented to take `true` or `false`;
simplify it to no longer add the now-unnecessary Boolean conversion.

(cherry picked from part of commit 2c5fc1827c)
2022-01-11 15:13:09 -08:00
Tim Abbott
868180a25d upgrade-zulip-from-git: Improve webpack failure error handling.
We've had a number of unhappy reports of upgrades failing due to
webpack requiring too much memory.  While the previous commit will
likely fix this issue for everyone, it's worth improving the error
message for failures here.

We avoid doing the stop+retry ourselves, because that could cause an
outage in a production system if webpack fails for another reason.

Fixes #20105.
2022-01-07 11:47:05 -08:00
Tim Abbott
20fc1f651a upgrade-zulip-from-git: Require more memory to run webpack.
Since the upgrade to Webpack 5, we've been seeing occasional reports
that servers with roughly 4GiB of RAM were getting OOM kills while
running webpack.

Since we can't readily optimize the memory requirements for webpack
itself, we should raise the RAM requirements for doing the
lower-downtime upgrade strategy.

Fixes #20231.
2022-01-07 11:47:05 -08:00
Emilio López
0d79d6735a docs: Clarify use of loadbalancer.ips when using a reverse proxy.
When Zulip is run behind one or more reverse proxies, you must
configure `loadbalancer.ips` so that Zulip respects the client IP
addresses found in the `X-Forwarded-For` header. This is not
immediately clear from the documentation, so this commit makes it more
clear and augments the existing examples to showcase this need.

Fixes: #19073
(cherry picked from commit baea14ee57)
2022-01-07 11:44:41 -08:00
Anders Kaseorg
45568a08c0 reindex-textual-data: Reindex textual functional indexes too.
This catches nine functional indexes that the previous query didn’t:

upper_preregistration_email_idx
upper_stream_name_idx
upper_subject_idx
upper_userprofile_email_idx
zerver_message_recipient_upper_subject
zerver_mutedtopic_stream_topic
zerver_stream_realm_id_name_uniq
zerver_userprofile_realm_id_delivery_email_uniq
zerver_userprofile_realm_id_email_uniq

Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit 1cc1de82cd)
2022-01-07 10:37:38 -08:00
Alex Vandiver
22152a0662 Revert "puppet: Do not assume amd64 architecture."
This reverts commit 859d88f76c.  It does
not work, since the sha256 hashes are different for different
architectures.

arm64 support exists in `main`.
2022-01-04 15:00:39 -08:00
Alya Abbott
9bbb336441 developer docs: Tweak ToS for push notifications wording. 2021-12-14 14:47:20 -08:00
Sahil Batra
3d966f1af9 message: Check wildcard mention restrictions while editing message.
This commit adds code to check whether a user is allowed to use
wildcard mention in a large stream or not while editing a message
based on the realm settings.

Previously this was only checked while sending message, thus user
was easily able to use wildcard mention by first sending a normal
message and then using a wildcard mention by editing it.

(cherry picked from commit b68ebf5a22)
2021-12-14 11:55:18 -08:00
Alex Vandiver
ab98f3801f setup-certbot: Reinstate nginx reload after installation.
If nginx was already installed, and we're using the webroot method of
initializing certbot, nginx needs to be reloaded.  Hooks in
`/etc/letsencrypt/renewal-hooks/deploy/` do not run during initial
`certbot certonly`, so an explicit reload is required.

(cherry picked from commit f6520a97cd)
2021-12-13 10:30:00 -08:00
Alex Vandiver
ddca8a7f9a puppet: Use certbot package timer, not our own cron job.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates.  This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.

Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.

Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`.  `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx.  The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.

(cherry picked from commit 01e8f752a8)
2021-12-09 13:48:20 -08:00
Tim Abbott
c1c3dfced5 scripts: Fix running compare-settings-to-template from any CWD.
This matches the number of dirname() calls for other files in its
directory.

Fixes #20489.
2021-12-07 14:47:27 -08:00
Alex Vandiver
2d3f505505 puppet: Install camo on Docker.
Now that go-camo runs within supervisor, it can be run in Docker
simply.

Fixes #20101.
Fixes zulip/docker-zulip#179.

(cherry picked from commit f31bf3f06c)
2021-12-06 19:33:31 +00:00
Alex Vandiver
d3573af95c puppet: Read camo secret at startup time, not at puppet-apply time.
Writing the secret to the supervisor configuration file makes changes
to the secret requires a zulip-puppet-apply to take hold.  The Docker
image is constructed to avoid having to run zulip-puppet-apply on
startup, and indeed cannot run zulip-puppet-apply after having
configured secrets, as it has replaced the zulip.conf file with a
symlink, for example.  This means that camo gets the static secret
that was built into the image, and not the one regenerated on first
startup.

Read the camo secret at process startup time.  Because this pattern is
likely common with "12-factor" applications which can read from
environment variables, write a generic tool to map secrets to
environment variables before exec'ing a binary, and use that for Camo.

(cherry picked from commit 358a7fb0c6)
2021-12-06 19:33:31 +00:00
Alex Vandiver
859d88f76c puppet: Do not assume amd64 architecture.
(cherry picked from commit 7db146d0a9)
2021-12-06 11:10:37 -08:00
Alex Vandiver
9a0fb497a4 changelog: Fix lint issues. 2021-12-01 23:39:28 +00:00
Alex Vandiver
7ea4ad75af version: Update version after 4.8 release. 2021-12-01 23:37:49 +00:00
Alex Vandiver
ae000bfdba Release Zulip Server 4.8 4.8 2021-12-01 23:17:46 +00:00
Mateusz Mandera
551b387164 CVE-2021-43791: Validate confirmation keys in /accounts/register/ codepath.
A confirmation link takes a user to the check_prereg_key_and_redirect
endpoint, before getting redirected to POST to /accounts/register/. The
problem was that validation was happening in the check_prereg_key_and_redirect
part and not in /accounts/register/ - meaning that one could submit an
expired confirmation key and be able to register.

We fix this by moving validation into /accouts/register/.
2021-12-01 23:13:11 +00:00
Mateusz Mandera
720d16e809 confirmation: Use error status codes for confirmation link error pages. 2021-12-01 20:28:51 +00:00
Alex Vandiver
f338ff64c3 puppet: Use sysv status command, not supervisorctl status.
Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11,
`supervisorctl status` returns exit code 3 if any of the
supervisor-controlled processes are not running.

Using `supervisorctl status` as the Puppet `status` command for
Supervisor leads to unnecessarily trying to "start" a Supervisor
process which is already started, but happens to have one or more of
its managed processes stopped.  This is an unnecessary no-op in
production environments, but in docker-init enviroments, such as in
CI, attempting to start the process a second time is an error.

Switch to checking if supervisor is running by way of sysv init.  This
fixes the potential error in CI, as well as eliminates unnecessary
"starts" of supervisor when it was already running -- a situation
which made zulip-puppet-apply not idempotent:

```
root@alexmv-prod:~# supervisorctl status
process-fts-updates                                             STOPPED   Nov 10 12:33 AM
smokescreen                                                     RUNNING   pid 1287280, uptime 0:35:32
zulip-django                                                    STOPPED   Nov 10 12:33 AM
zulip-tornado                                                   STOPPED   Nov 10 12:33 AM
[...]

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.91 seconds

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.92 seconds
```

(cherry picked from commit 7af2fa2e92)
2021-12-01 12:19:30 -08:00
Tim Abbott
98610c984c i18n: Add Sinhala translation. 2021-11-30 15:09:31 -08:00
Tim Abbott
ab965e5892 i18n: Update translation dat from Transifex. 2021-11-30 15:08:05 -08:00
PIG208
7a03827047 integrations: Add V3 support for PagerDuty. 2021-11-30 14:43:03 -08:00
PIG208
5954e622bc doc: Change supported extension type to reflect the change. 2021-11-30 14:42:57 -08:00
PIG208
687db48ea8 integrations: Change format of templates for PagerDuty V3.
Because the payload of V3 will no longer include the description,
We replace the ":" by "." in the message and create the new string
template for trigger messages.
2021-11-30 14:42:31 -08:00
Alex Vandiver
399391c3aa puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.

(cherry picked from commit c514feaa22)
2021-11-19 17:51:08 -08:00
Alex Vandiver
cd5eec5eea camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo

(cherry picked from commit b982222e03)
2021-11-19 17:50:47 -08:00
Alex Vandiver
e7d48c0c10 puppet: Default to installing smokescreen on application frontends.
This is an additional security hardening step, to make Zulip default
to preventing SSRF attacks.  The overhead of running Smokescreen is
minimal, and there is no reason to force deployments to take
additional steps in order to secure themselves against SSRF attacks.

Deployments which already have a different external proxy configured
will not gain a local Smokescreen installation, and running without
Smokescreen is supported by explicitly unsetting the `host` or `port`
values in `/etc/zulip/zulip.conf`.

(cherry picked from commit c33562f0a8)
2021-11-19 17:49:37 -08:00
Alex Vandiver
023dfc01ba puppet: Split smokescreen into a non-profile version.
In a subsequent commit, we intend to include it from
`zulip::app_frontend_base`, which is a layering violation if it only
exists in the form of a profile.

(cherry picked from commit 44f1ea6bae)
2021-11-19 17:49:22 -08:00
Alex Vandiver
5d9285fff3 puppet: Remove unused smokescreen symlink.
(cherry picked from commit c2ed3c22b5)
2021-11-19 17:48:38 -08:00
Alex Vandiver
53f353ec26 puppet: Tidy old smokescreen binaries.
(cherry picked from commit 47e16a5d41)
2021-11-19 17:48:38 -08:00
Alex Vandiver
245c87c567 puppet: Embed golang version into binary path, to rebuild on new golang.
This will cause the output binary path to be sensitive to golang
version, causing it to be rebuilt on new golang, and an updated
supervisor config file written out, and thus supervisor also
restarted.

(cherry picked from commit 239ac8413e)
2021-11-19 17:48:38 -08:00
Alex Vandiver
26aa4d57e3 puppet: Factor out smokescreen binary path.
(cherry picked from commit 216eeba2dd)
2021-11-19 17:48:37 -08:00
Alex Vandiver
bee225782a puppet: Switch smokescreen to using zulip::external_dep, so it tidies.
(cherry picked from commit 3a7cef6582)
2021-11-19 17:48:37 -08:00
Alex Vandiver
4a6e69357a puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src.
As with the previous commit for `/srv/golang`, we have the custom of
namespacing things under `/srv` with `zulip-` to help ensure that we
play nice with anything else that happens to be on the host.

(cherry picked from commit ea08111d60)
2021-11-19 17:48:37 -08:00