Commit Graph

1041 Commits

Author SHA1 Message Date
Anders Kaseorg
646a4d19a3 puppet: Remove quotes for enumerable values.
https://puppet.com/docs/puppet/7/style_guide.html#style_guide_module_design-quoting
“If a string is a value from an enumerable set of options, such as
present and absent, it SHOULD NOT be enclosed in quotes at all.”

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 22:06:46 -07:00
Alex Vandiver
c97162e485 puppet: Check that certbot certs are in use before fixing them.
It is possible to have previously installed certbot, but switched back
to using self-signed certificates -- in which case renewing them using
certbot may fail.

Verify that the certificate is a symlink into certbot's output
directory before running `fix-standalone-certbot`.
2022-04-27 16:01:15 -07:00
Anders Kaseorg
098a514599 python: Use Python 3.8 shlex.join function.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 12:57:49 -07:00
Anders Kaseorg
a7e6cb7705 puppet: ‘supervisorctl stop all’ before restarting Supervisor.
This fixes a failure of the 3.4 upgrade test running on Ubuntu 20.04
with Supervisor 4.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-26 16:32:02 -07:00
Alex Vandiver
e5548ecba0 puppet: Upgrade external dependencies. 2022-04-21 13:54:14 -07:00
Alex Vandiver
1151118cc8 puppet: Upgrade Grafana to 8.4.6. 2022-04-12 16:41:45 -07:00
Alex Vandiver
572443edc6 puppet: Remove memcached SASL workaround.
https://bugs.launchpad.net/ubuntu/+source/memcached/+bug/1878721 was
fixed and released in Focal in 2020-06-24.

We don't bother with an `ensure => absent` because leaving this
in-place for existing installs does no harm.
2022-04-08 14:59:45 -07:00
Anders Kaseorg
935cb605a5 puppet: Do not ensure Chrony is running.
Commit f6d27562fa (#21564) tried to
ensure Chrony is running, which fails in containers where Chrony
doesn’t have permission to update the host clock.

The Debian package should still attempt to start it, and Puppet should
still restart it when chrony.conf is modified.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-03-30 11:37:54 -07:00
Alex Vandiver
f6d27562fa puppet: Configure chrony to use AWS-local NTP sources.
This prevents hosts from spewing traffic to random hosts across the
Internet.
2022-03-25 17:07:53 -07:00
Alex Vandiver
5e128e7cad puppet: Extract the wal-g configuration from the backups.
This will allow it to be used for monitoring, to check the state in S3
rather than just trusting the backups when they said they ran.
2022-03-25 17:05:30 -07:00
Alex Vandiver
d7b59c86ce puppet: Build wal-g from source for aarch64.
Since wal-g does not provide binaries for aarch64, build them from
source.  While building them from source for arm64 would better ensure
that build process is tested, the build process takes 7min and 700M of
temp files, which is an unacceptable cost; we thus only build on
aarch64.

Since the wal-g build process uses submodules, which are not in the
Github export, we clone the full wal-g repository.  Because the
repository is relatively small, we clone it anew on each new version,
rather than attempt to manage the remotes.

Fixes #21070.
2022-03-22 15:02:35 -07:00
Alex Vandiver
4d4c320a07 puppet: Switch from ntp to chrony.
Chrony is the recommended time server for Ubuntu since 18.04[1], and
is the default on Redhat; it is more accurate, and has lower-memory
usage, than ntp, which is only getting best-effort security
maintenance.

See:
- https://wiki.ubuntu.com/BionicBeaver/ReleaseNotes#Chrony
- https://chrony.tuxfamily.org/comparison.html
- https://engineering.fb.com/2020/03/18/production-engineering/ntp-service/
2022-03-22 13:07:27 -07:00
Alex Vandiver
a2c8be9cd5 puppet: Increase download timeout from 5m to 10m.
The default timeout for `exec` commands in Puppet is 5 minutes[1].  On
slow connections, this may not be sufficient to download larger
downloads, such as the ~135MB golang tarball.

Increase the timeout to 10 minutes; this is a minimum download speed
of is ~225kB/s.

Fixes #21449.

[1]: https://puppet.com/docs/puppet/5.5/types/exec.html#exec-attribute-timeout
2022-03-21 15:47:04 -07:00
Alex Vandiver
9e850b08f3 puppet: Fix the PostgreSQL paths to recovery.conf / standby.conf. 2022-03-20 16:16:04 -07:00
Alex Vandiver
1bd5723cd2 puppet: Add a prometheus monitor for tornado processes. 2022-03-20 16:12:11 -07:00
Alex Vandiver
6489c832a3 puppet: Upgrade third-party package versions. 2022-03-17 11:44:05 -07:00
Alex Vandiver
d17006da55 puppet: Support setting an ssl_mode verification level. 2022-03-15 12:43:50 -07:00
Alex Vandiver
253bef27f5 puppet: Support password-based PostgreSQL replication. 2022-03-15 12:43:50 -07:00
Sahil Batra
f0606b34ad user_groups: Add cron job for adding users to full members system group.
This commit adds a cron job which runs every hour to add the users to
full members system group if user is promoted to a full member.

This should ensure that full member status is available no more than
an hour after configuration suggests it should be.
2022-03-14 18:53:47 -07:00
Alex Vandiver
6f5ae8d13d puppet: wal-g backups are required for replication.
Previously, it was possible to configure `wal-g` backups without
replication enabled; this resulted in only daily backups, not
streaming backups.  It was also possible to enable replication without
configuring the `wal-g` backups bucket; this simply failed to work.

Make `wal-g` backups always streaming, and warn loudly if replication
is enabled but `wal-g` is not configured.
2022-03-11 10:09:35 -08:00
Alex Vandiver
6496d43148 puppet: Only s3_backups_bucket is required for backups.
`s3_backups_key` / `s3_backups_secret_key` are optional, as the
permissions could come from the EC2 instance's role.
2022-03-11 10:09:35 -08:00
Alex Vandiver
19beed2709 puppet: Default s3_region to the current ec2 region. 2022-03-11 10:09:35 -08:00
Anders Kaseorg
1629d6bfb3 python: Reformat with Black 22 (stable).
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-18 18:03:13 -08:00
Alex Vandiver
c656d933fa puppet: Switch from $::memorysize_mb to non-legacy $::memory. 2022-02-15 12:04:37 -08:00
Alex Vandiver
f2f4462e71 puppet: Switch from $::fqdn to non-legacy $::networking. 2022-02-15 12:04:37 -08:00
Alex Vandiver
bb4c0799cc puppet: Switch to the canonical case for $::os['family'].
The == operator in Puppet is case-insensitive for ASCII
characters[1], which is potentially surprising.  Switch to the
canonical case that `$::os['family']` returns.

[1] https://puppet.com/docs/puppet/5.5/lang_expressions.html#string-encoding-and-comparisons
2022-02-15 12:04:37 -08:00
Alex Vandiver
d4eefbbeea puppet: Switch from $::osfamily to non-legacy $::os. 2022-02-15 12:04:37 -08:00
Alex Vandiver
a787ebe0e2 puppet: Switch from $::architecture to non-legacy $::os. 2022-02-15 12:04:37 -08:00
Alex Vandiver
d7e8733705 puppet: Use goarch for wal-g.
wal-g does not currently provide pre-built binaries for
arm64/aarch64 (see #21070) but if they begin to, it will likely be
with the goarch names.
2022-02-15 12:04:37 -08:00
Alex Vandiver
abdbe4ca83 puppet: Use goarch for go-camo. 2022-02-15 12:04:37 -08:00
Alex Vandiver
be2f2a5bde puppet: Use goarch for golang.
Fixes: #21051.
2022-02-15 12:04:37 -08:00
Alex Vandiver
788daa953b puppet: Factor out $::architecture case statement for golang. 2022-02-15 12:04:37 -08:00
Anders Kaseorg
45f4db9702 puppet: Remove unused $release_name.
It would confuse a future Debian 15.10 release with Ubuntu 15.10, it
relies on the legacy fact $::operatingsystemrelease, the modern fact
$::os provides this information without extra logic, and it’s unused
as of commit 03bffd3938.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Alex Vandiver
291c5e87b6 puppet: Upgrade prometheus to 2.33.1. 2022-02-09 20:32:24 -08:00
Alex Vandiver
2d538c2356 puppet: Upgrade grafana to 8.3.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver
f2e66c0b20 puppet: Upgrade go-camo to 2.4.0. 2022-02-09 20:32:24 -08:00
Alex Vandiver
51a516384d puppet: Upgrade golang to 1.17.6. 2022-02-09 20:32:24 -08:00
Alex Vandiver
b3900bec7e puppet: Upgrade Grafana to 8.3.5.
https://grafana.com/docs/grafana/latest/release-notes/release-notes-8-3-5/
2022-02-08 11:13:40 -08:00
Alex Vandiver
a46f6df91e CVE-2021-43799: Write rabbitmq configuration before starting.
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.

The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.

The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.

Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost.  This does not mitigate existing installs, since
it does not force a rabbitmq restart.

[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
2022-01-25 01:48:05 +00:00
Alex Vandiver
43d63bd5a1 puppet: Always set the RabbitMQ nodename to zulip@localhost.
This is required in order to lock down the RabbitMQ port to only
listen on localhost.  If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.

Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.

The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed.  However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
2022-01-25 01:48:02 +00:00
Alex Vandiver
3bfcfeac24 puppet: Run configure-rabbitmq on nodename change.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.

Trigger this in puppet by running configure-rabbitmq after the file
changes.
2022-01-25 01:46:51 +00:00
Alex Vandiver
694c4dfe8f puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-25 01:46:51 +00:00
Alex Vandiver
2713e90eaf puppet: Remove rabbitmq_mochiweb configuration.
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled.  Nor does this control the management interface, which
would listen on port 15672.
2022-01-25 01:46:51 +00:00
Alex Vandiver
a3adaf4aa3 puppet: Fix standalone certbot configurations.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.

This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration.  Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.

Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information.  We use `grep`, on the assumption that this will catch
nearly all cases.

It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.

Fixes #20593.
2022-01-24 12:13:44 -08:00
Anders Kaseorg
97e4e9886c python: Replace universal_newlines with text.
This is supported in Python ≥ 3.7.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-23 22:16:01 -08:00
Anders Kaseorg
a58a71ef43 Remove Ubuntu 18.04 support.
As a consequence:

• Bump minimum supported Python version to 3.7.
• Move Vagrant environment to Debian 10, which has Python 3.7.
• Move CI frontend tests to Debian 10.
• Move production build test to Debian 10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-01-21 17:26:14 -08:00
Alex Vandiver
fb4d9764fa puppet: Bump Grafana version, for 8.3.4. security release. 2022-01-18 12:33:02 -08:00
Alex Vandiver
434bda01c7 puppet: Enable camo prometheus metrics.
Doing so requires protecting /metrics from direct access when proxied
through nginx.  If camo is placed on a separate host, the equivalent
/metrics URL may need to be protected.

See https://github.com/cactus/go-camo#metrics for details on the
statistics so reported.  Note that 5xx responses are _expected_ from
go-camo's statistics, as it returns 502 status code when the remote
server responds with 500/502/503/504, or 504 when the remote host
times out.
2022-01-13 14:19:18 -08:00
Alex Vandiver
d328d3dd4d puppet: Allow routing camo requests through an outgoing proxy.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary.  However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.

Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be.  Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.

Fixes: #20550.
2022-01-07 12:08:10 -08:00
Alex Vandiver
2c5fc1827c puppet: Standardize what values are bools, and what true is.
For `no_serve_uploads`, `http_only`, which previously specified
"non-empty" to enable, this tightens what values are true.  For
`pgroonga` and `queue_workers_multiprocess`, this broadens the
possible values from `enabled`, and `true` respectively.
2022-01-07 12:08:10 -08:00