1209 Commits

Author SHA1 Message Date
Alex Vandiver
5784bdd0ed puppet: Use goarch for wal-g.
wal-g does not currently provide pre-built binaries for
arm64/aarch64 (see #21070) but if they begin to, it will likely be
with the goarch names.

(cherry picked from commit d7e8733705)
2022-02-15 15:57:00 -08:00
Alex Vandiver
e10ea15aa9 puppet: Use goarch for go-camo.
(cherry picked from commit abdbe4ca83)
2022-02-15 15:57:00 -08:00
Alex Vandiver
d860242220 puppet: Use goarch for golang.
Fixes: #21051.
(cherry picked from commit be2f2a5bde)
2022-02-15 15:57:00 -08:00
Alex Vandiver
b2c3f5e510 puppet: Include go version in go-camo release information. 2022-02-15 15:57:00 -08:00
Alex Vandiver
232fe495be puppet: Factor out $::architecture case statement for golang.
(cherry picked from commit 788daa953b)
2022-02-15 15:57:00 -08:00
Alex Vandiver
c20afad828 puppet: Add aarch64 build hashes to external dependencies.
wal-g does not ship aarch64 binaries, currently; the compilation
process([1]) is somewhat complicated, so we defer the decision about
how to support wal-g for aarch64 until a later date.

[1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing

(cherry picked from commit c094867a74)
2022-02-15 15:57:00 -08:00
Alex Vandiver
3fad49a9c1 puppet: Centralize versions and sha256 hashes of external dependencies.
This will make it easier to update versions of these dependencies.

(cherry picked from commit f166f9f7d6)
2022-02-15 15:57:00 -08:00
Alex Vandiver
cc95aac176 puppet: Move wal-g to external_dep, in /srv/zulip-wal-g-*. 2022-02-15 15:57:00 -08:00
Alex Vandiver
1b27ec9fae puppet: Stop making resources for external binaries and directories.
In the event that extracting doesn't produce the binary we expected it
to, all this will do is create an _empty_ file where we expect the
binary to be.  This will likely muddle debugging.

Since the only reason the resource was made in the first place was to
make dependencies clear, switch to depending on the External_Dep
itself, when such a dependency is needed.

(cherry picked from commit 1e4e6a09af)
2022-02-15 15:57:00 -08:00
Alex Vandiver
ebd74239a2 puppet: Move slash out of $dir by convention.
(cherry picked from commit 3c163a7d5e)
2022-02-15 15:57:00 -08:00
Alex Vandiver
8dcb1e489d puppet: Adjust wal-g release version and SHA256.
wal-g apparently removed the 1.1.1 release; replace it with the
equivalent rc.

(cherry picked from commit d2a78bac7e)
2022-02-15 15:57:00 -08:00
Alex Vandiver
1c819208d0 setup: Merge multiple setup-apt-repo scripts into one.
This moves the `.asc` files into subdirectories, and writes out the
according `.list` files into them.  It moves from templates to
written-out `.list` files for clarity and ease of
implementation (Debian and Ubuntu need different templates for
`zulip`), and as a way of making explicit which releases are supported
for each list.  For the special-case of the PGroonga signing key, we
source an additional file within the directory.

This simplifies the process for adding another class of `.list` file.

(cherry picked from commit f3eea72c2a)
2022-02-14 15:05:06 -08:00
Alex Vandiver
134a8d4301 CVE-2021-43799: Write rabbitmq configuration before starting.
Zulip writes a `rabbitmq.config` configuration file which locks down
RabbitMQ to listen only on localhost:5672, as well as the RabbitMQ
distribution port, on localhost:25672.

The "distribution port" is part of Erlang's clustering configuration;
while it is documented that the protocol is fundamentally
insecure ([1], [2]) and can result in remote arbitrary execution of
code, by default the RabbitMQ configuration on Debian and Ubuntu
leaves it publicly accessible, with weak credentials.

The configuration file that Zulip writes, while effective, is only
written _after_ the package has been installed and the service
started, which leaves the port exposed until RabbitMQ or system
restart.

Ensure that rabbitmq's `/etc/rabbitmq/rabbitmq.config` is written
before rabbitmq is installed or starts, and that changes to that file
trigger a restart of the service, such that the ports are only ever
bound to localhost.  This does not mitigate existing installs, since
it does not force a rabbitmq restart.

[1] https://www.erlang.org/doc/apps/erts/erl_dist_protocol.html
[2] https://www.erlang.org/doc/reference_manual/distributed.html#distributed-erlang-system
2022-01-25 01:34:17 +00:00
Alex Vandiver
a07f64a463 puppet: Always set the RabbitMQ nodename to zulip@localhost.
This is required in order to lock down the RabbitMQ port to only
listen on localhost.  If the nodename is `rabbit@hostname`, in most
circumstances the hostname will resolve to an external IP, which the
rabbitmq port will not be bound to.

Installs which used `rabbit@hostname`, due to RabbitMQ having been
installed before Zulip, would not have functioned if the host or
RabbitMQ service was restarted, as the localhost restrictions in the
RabbitMQ configuration would have made rabbitmqctl (and Zulip cron
jobs that call it) unable to find the rabbitmq server.

The previous commit ensures that configure-rabbitmq is re-run after
the nodename has changed.  However, rabbitmq needs to be stopped
before `rabbitmq-env.conf` is changed; we use an `onlyif` on an `exec`
to print the warning about the node change, and let the subsequent
config change and notify of the service and configure-rabbitmq to
complete the re-configuration.
2022-01-25 01:33:27 +00:00
Alex Vandiver
e9af26df6e puppet: Run configure-rabbitmq on nodename change.
`/etc/rabbitmq/rabbitmq-env.conf` sets the nodename; anytime the
nodename changes, the backing database changes, and this requires
re-creating the rabbitmq users and permissions.

Trigger this in puppet by running configure-rabbitmq after the file
changes.
2022-01-24 23:09:02 +00:00
Alex Vandiver
d95fb34ba7 puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-24 23:09:02 +00:00
Alex Vandiver
5ff759d35c puppet: Remove rabbitmq_mochiweb configuration.
mochiweb was renamed to web_dispatch in RabbitMQ 3.8.0, and the plugin
is not enabled.  Nor does this control the management interface, which
would listen on port 15672.
2022-01-24 23:09:02 +00:00
Alex Vandiver
2e1e2b08f1 puppet: Fix standalone certbot configurations.
This addresses the problems mentioned in the previous commit, but for
existing installations which have `authenticator = standalone` in
their configurations.

This reconfigures all hostnames in certbot to use the webroot
authenticator, and attempts to force-renew their certificates.
Force-renewal is necessary because certbot contains no way to merely
update the configuration.  Let's Encrypt allows for multiple extra
renewals per week, so this is a reasonable cost.

Because the certbot configuration is `configobj`, and not
`configparser`, we have no way to easily parse to determine if webroot
is in use; additionally, `certbot certificates` does not provide this
information.  We use `grep`, on the assumption that this will catch
nearly all cases.

It is possible that this will find `authenticator = standalone`
certificates which are managed by Certbot, but not Zulip certificates.
These certificates would also fail to renew while Zulip is running, so
switching them to use the Zulip webroot would still be an improvement.

Fixes #20593.

(cherry picked from commit a3adaf4aa3)
2022-01-24 20:14:23 +00:00
Alex Vandiver
1d559bbffa puppet: Allow routing camo requests through an outgoing proxy.
Because Camo includes logic to deny access to private subnets, routing
its requests through Smokescreen is generally not necessary.  However,
it may be necessary if Zulip has configured a non-Smokescreen exit
proxy.

Default Camo to using the proxy only if it is not Smokescreen, with a
new `proxy.enable_for_camo` setting to override this behaviour if need
be.  Note that that setting is in `zulip.conf` on the host with Camo
installed -- not the Zulip frontend host, if they are different.

Fixes: #20550.
(cherry picked from commit d328d3dd4d)
2022-01-11 15:13:09 -08:00
Alex Vandiver
cb24f93bba puppet: Make zulipconf() understand coercion to bool.
If the default is a bool, coerce the value into a bool as well.  For
backwards compatibility, this does not adjust any existing callsites.

`queue_workers_multiprocess` is the only setting which is passed a
bool default, but it was already documented to take `true` or `false`;
simplify it to no longer add the now-unnecessary Boolean conversion.

(cherry picked from part of commit 2c5fc1827c)
2022-01-11 15:13:09 -08:00
Alex Vandiver
22152a0662 Revert "puppet: Do not assume amd64 architecture."
This reverts commit 859d88f76c.  It does
not work, since the sha256 hashes are different for different
architectures.

arm64 support exists in `main`.
2022-01-04 15:00:39 -08:00
Alex Vandiver
ddca8a7f9a puppet: Use certbot package timer, not our own cron job.
The certbot package installs its own systemd timer (and cron job,
which disabled itself if systemd is enabled) which updates
certificates.  This process races with the cron job which Zulip
installs -- the only difference being that Zulip respects the
`certbot.auto_renew` setting, and that it passes the deploy hook.
This means that occasionally nginx would not be reloaded, when the
systemd timer caught the expiration first.

Remove the custom cron job and `certbot-maybe-renew` script, and
reconfigure certbot to always reload nginx after deploying, using
certbot directory hooks.

Since `certbot.auto_renew` can't have an effect, remove the setting.
In turn, this removes the need for `--no-zulip-conf` to
`setup-certbot`.  `--deploy-hook` is similarly removed, as running
deploy hooks to restart nginx is now the default; pass
`--no-directory-hooks` in standalone mode to not attempt to reload
nginx.  The other property of `--deploy-hook`, of skipping symlinking
into place, is given its own flog.

(cherry picked from commit 01e8f752a8)
2021-12-09 13:48:20 -08:00
Alex Vandiver
2d3f505505 puppet: Install camo on Docker.
Now that go-camo runs within supervisor, it can be run in Docker
simply.

Fixes #20101.
Fixes zulip/docker-zulip#179.

(cherry picked from commit f31bf3f06c)
2021-12-06 19:33:31 +00:00
Alex Vandiver
d3573af95c puppet: Read camo secret at startup time, not at puppet-apply time.
Writing the secret to the supervisor configuration file makes changes
to the secret requires a zulip-puppet-apply to take hold.  The Docker
image is constructed to avoid having to run zulip-puppet-apply on
startup, and indeed cannot run zulip-puppet-apply after having
configured secrets, as it has replaced the zulip.conf file with a
symlink, for example.  This means that camo gets the static secret
that was built into the image, and not the one regenerated on first
startup.

Read the camo secret at process startup time.  Because this pattern is
likely common with "12-factor" applications which can read from
environment variables, write a generic tool to map secrets to
environment variables before exec'ing a binary, and use that for Camo.

(cherry picked from commit 358a7fb0c6)
2021-12-06 19:33:31 +00:00
Alex Vandiver
859d88f76c puppet: Do not assume amd64 architecture.
(cherry picked from commit 7db146d0a9)
2021-12-06 11:10:37 -08:00
Alex Vandiver
f338ff64c3 puppet: Use sysv status command, not supervisorctl status.
Since Supervisor 4, which is installed on Ubuntu 20.04 and Debian 11,
`supervisorctl status` returns exit code 3 if any of the
supervisor-controlled processes are not running.

Using `supervisorctl status` as the Puppet `status` command for
Supervisor leads to unnecessarily trying to "start" a Supervisor
process which is already started, but happens to have one or more of
its managed processes stopped.  This is an unnecessary no-op in
production environments, but in docker-init enviroments, such as in
CI, attempting to start the process a second time is an error.

Switch to checking if supervisor is running by way of sysv init.  This
fixes the potential error in CI, as well as eliminates unnecessary
"starts" of supervisor when it was already running -- a situation
which made zulip-puppet-apply not idempotent:

```
root@alexmv-prod:~# supervisorctl status
process-fts-updates                                             STOPPED   Nov 10 12:33 AM
smokescreen                                                     RUNNING   pid 1287280, uptime 0:35:32
zulip-django                                                    STOPPED   Nov 10 12:33 AM
zulip-tornado                                                   STOPPED   Nov 10 12:33 AM
[...]

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.32 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.91 seconds

root@alexmv-prod:~# ~zulip/deployments/current/scripts/zulip-puppet-apply --force
Notice: Compiled catalog for alexmv-prod.zulipdev.org in environment production in 2.35 seconds
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
Notice: Applied catalog in 0.92 seconds
```

(cherry picked from commit 7af2fa2e92)
2021-12-01 12:19:30 -08:00
Alex Vandiver
399391c3aa puppet: Default go-camo to listening on localhost for standalone deploys.
The default in the previous commit, inherited from camo, was to bind
to 0.0.0.0:9292.  In standalone deployments, camo is deployed on the
same host as the nginx reverse proxy, and as such there is no need to
open it up to other IPs.

Make `zulip::camo` take an optional parameter, which allows overriding
it in puppet, but skips a `zulip.conf` setting for it, since it is
unlikely to be adjust by most users.

(cherry picked from commit c514feaa22)
2021-11-19 17:51:08 -08:00
Alex Vandiver
cd5eec5eea camo: Replace with go-camo implementation.
The upstream of the `camo` repository[1] has been unmaintained for
several years, and is now archived by the owner.  Additionally, it has
a number of limitations:
 - It is installed as a sysinit service, which does not run under
   Docker
 - It does not prevent access to internal IPs, like 127.0.0.1
 - It does not respect standard `HTTP_proxy` environment variables,
   making it unable to use Smokescreen to prevent the prior flaw
 - It occasionally just crashes, and thus must have a cron job to
   restart it.

Swap camo out for the drop-in replacement go-camo[2], which has the
same external API, requiring not changes to Django code, but is more
maintained.  Additionally, it resolves all of the above complaints.

go-camo is not configured to use Smokescreen as a proxy, because its
own private-IP filtering prevents using a proxy which lies within that
IP space.  It is also unclear if the addition of Smokescreen would
provide any additional protection over the existing IP address
restrictions in go-camo.

go-camo has a subset of the security headers that our nginx reverse
proxy sets, and which camo set; provide the missing headers with `-H`
to ensure that go-camo, if exposed from behind some other non-nginx
load-balancer, still provides the necessary security headers.

Fixes #18351 by moving to supervisor.
Fixes zulip/docker-zulip#298 also by moving to supervisor.

[1] https://github.com/atmos/camo
[2] https://github.com/cactus/go-camo

(cherry picked from commit b982222e03)
2021-11-19 17:50:47 -08:00
Alex Vandiver
e7d48c0c10 puppet: Default to installing smokescreen on application frontends.
This is an additional security hardening step, to make Zulip default
to preventing SSRF attacks.  The overhead of running Smokescreen is
minimal, and there is no reason to force deployments to take
additional steps in order to secure themselves against SSRF attacks.

Deployments which already have a different external proxy configured
will not gain a local Smokescreen installation, and running without
Smokescreen is supported by explicitly unsetting the `host` or `port`
values in `/etc/zulip/zulip.conf`.

(cherry picked from commit c33562f0a8)
2021-11-19 17:49:37 -08:00
Alex Vandiver
023dfc01ba puppet: Split smokescreen into a non-profile version.
In a subsequent commit, we intend to include it from
`zulip::app_frontend_base`, which is a layering violation if it only
exists in the form of a profile.

(cherry picked from commit 44f1ea6bae)
2021-11-19 17:49:22 -08:00
Alex Vandiver
5d9285fff3 puppet: Remove unused smokescreen symlink.
(cherry picked from commit c2ed3c22b5)
2021-11-19 17:48:38 -08:00
Alex Vandiver
53f353ec26 puppet: Tidy old smokescreen binaries.
(cherry picked from commit 47e16a5d41)
2021-11-19 17:48:38 -08:00
Alex Vandiver
245c87c567 puppet: Embed golang version into binary path, to rebuild on new golang.
This will cause the output binary path to be sensitive to golang
version, causing it to be rebuilt on new golang, and an updated
supervisor config file written out, and thus supervisor also
restarted.

(cherry picked from commit 239ac8413e)
2021-11-19 17:48:38 -08:00
Alex Vandiver
26aa4d57e3 puppet: Factor out smokescreen binary path.
(cherry picked from commit 216eeba2dd)
2021-11-19 17:48:37 -08:00
Alex Vandiver
bee225782a puppet: Switch smokescreen to using zulip::external_dep, so it tidies.
(cherry picked from commit 3a7cef6582)
2021-11-19 17:48:37 -08:00
Alex Vandiver
4a6e69357a puppet: Move /srv/smokescreen-src to /srv/zulip-smokescreen-src.
As with the previous commit for `/srv/golang`, we have the custom of
namespacing things under `/srv` with `zulip-` to help ensure that we
play nice with anything else that happens to be on the host.

(cherry picked from commit ea08111d60)
2021-11-19 17:48:37 -08:00
Anders Kaseorg
3e6d3810d4 puppet: Upgrade Smokescreen v0.0.2-59-gbfca45c to v0.0.2-63-gdc40301.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
(cherry picked from commit c64e1adb19)
2021-11-19 17:48:37 -08:00
Alex Vandiver
bc21dde235 puppet: Extract an external-tarball-dependency manifest.
(cherry picked from commit bb9d2df1ae)
2021-11-19 17:48:37 -08:00
Alex Vandiver
182ce488e2 puppet: Tidy old golang directories.
This relies on behavior which is only in Puppet 5.5.1 and above, which
means it must be skipped on Ubuntu 18.04.

(cherry picked from commit 3c8d7e2598)
2021-11-19 17:48:37 -08:00
Alex Vandiver
bd557a9a13 puppet: Move /srv/golang to /srv/zulip-golang.
We have the custom of namespacing things under `/srv` with `zulip-`
to help ensure that we play nice with anything else that happens
to be on the host.

(cherry picked from commit 2fc4acdf81)
2021-11-19 17:48:36 -08:00
Alex Vandiver
7e8ead7325 puppet: Switch dependency to the golang binary we need.
(cherry picked from commit 00a4abb642)
2021-11-19 17:48:36 -08:00
Alex Vandiver
8fa783f13d puppet: Stop making a /srv/golang symlink.
Nothing needs this extra directory.

(cherry picked from commit 2d5f813094)
2021-11-19 17:48:36 -08:00
Alex Vandiver
11924f4b66 puppet: Factor out golang variables.
(cherry picked from commit 93af6c7f06)
2021-11-19 17:48:36 -08:00
Alex Vandiver
f01cbba0ce puppet: Shorten golang version variable name.
(cherry picked from commit 21be36f15f)
2021-11-19 17:48:36 -08:00
Alex Vandiver
31050be173 puppet: Upgrade golang from 1.16.4 to 1.17.3.
(cherry picked from commit 6b9e74adee)
2021-11-19 17:48:35 -08:00
Alex Vandiver
56d857ca89 puppet: Split out golang toolchain into its own manifest.
(cherry picked from commit 514801c509)
2021-11-19 17:48:35 -08:00
Adam Benesh
6ba333c2ff puppet: Add WSGIApplicationGroup config to Apache SSO example.
Zulip apparently is now affected by a bad interaction between Apache's
WSGI using Python subinterpreters and C extension modules like `re2`
that are not designed for it.

The solution is apparently to set WSGIApplicationGroup to %{GLOBAL},
which disables Apache's use of Python subinterpreters.

See https://serverfault.com/questions/514242/non-responsive-apache-mod-wsgi-after-installing-scipy/514251#514251 for background.

Fixes #19924.
2021-10-08 15:08:14 -07:00
Alex Vandiver
5feba78939 upgrade-postgresql: Do not remove other supervisor configs.
We previously used `zulip-puppet-apply` with a custom config file,
with an updated PostgreSQL version but more limited set of
`puppet_classes`, to pre-create the basic settings for the new cluster
before running `pg_upgradecluster`.

Unfortunately, the supervisor config uses `purge => true` to remove
all SUPERVISOR configuration files that are not included in the puppet
configuration; this leads to it removing all other supervisor
processes during the upgrade, only to add them back and start them
during the second `zulip-puppet-apply`.

It also leads to `process-fts-updates` not being started after the
upgrade completes; this is the one supervisor config file which was
not removed and re-added, and thus the one that is not re-started due
to having been re-added.  This was not detected in CI because CI added
a `start-server` command which was not in the upgrade documentation.

Set a custom facter fact that prevents the `purge` behaviour of the
supervisor configuration.  We want to preserve that behaviour in
general, and using `zulip-puppet-apply` continues to be the best way
to pre-set-up the PostgreSQL configuration -- but we wish to avoid
that behaviour when we know we are applying a subset of the puppet
classes.

Since supervisor configs are no longer removed and re-added, this
requires an explicit start-server step in the instructions after the
upgrades complete.  This brings the documentation into alignment with
what CI is testing.
2021-08-24 19:02:24 -07:00
Alex Vandiver
5290f17adb puppet: Run the supervisor-restart step only after it is started.
In an initial install, the following is a potential rule ordering:
```
Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/conf.d/zulip]/ensure: created
Notice: /Stage[main]/Zulip::Supervisor/File[/etc/supervisor/supervisord.conf]/content: content changed '{md5}99dc7e8a1178ede9ae9794aaecbca436' to '{md5}7ef9771d2c476c246a3ebd95fab784cb'
Notice: /Stage[main]/Zulip::Supervisor/Exec[supervisor-restart]: Triggered 'refresh' from 1 event
[...]
Notice: /Stage[main]/Zulip::App_frontend_base/File[/etc/supervisor/conf.d/zulip/zulip.conf]/ensure: defined content as '{md5}d98ac8a974d44efb1d1bb2ef8b9c3dee'
[...]
Notice: /Stage[main]/Zulip::App_frontend_once/File[/etc/supervisor/conf.d/zulip/zulip-once.conf]/ensure: defined content as '{md5}53f56ae4b95413bfd7a117e3113082dc'
[...]
Notice: /Stage[main]/Zulip::Process_fts_updates/File[/etc/supervisor/conf.d/zulip/zulip_db.conf]/ensure: defined content as '{md5}96092d7f27d76f48178a53b51f80b0f0'
Notice: /Stage[main]/Zulip::Supervisor/Service[supervisor]/ensure: ensure changed 'stopped' to 'running'
```

The last line is misleading -- supervisor was already started by the
`supervisor-restart` process on the third line.  As can be shown with
`zulip-puppet-apply --debug`, the last line just installs supervisor
to run on startup, using `systemctl`:
```
Debug: Executing: 'supervisorctl status'
Debug: Executing: '/usr/bin/systemctl unmask supervisor'
Debug: Executing: '/usr/bin/systemctl start supervisor'
```

This means the list of processes started by supervisor depends
entirely on which configuration files were successfully written out by
puppet before the initial `supervisor-restart` ran.  Since
`zulip_db.conf` is written later than the rest, the initial install
often fails to start the `process-fts-updates` process.  In this
state, an explicit `supervisorctl restart` or `supervisorctl reread &&
supervisorctl update` is required for the service to be found and
started.

Reorder the `supervisor-restart` exec to only run after the service is
started.  Because all supervisor configuration files have a `notify`
of the service, this forces the ordering of:

```
(package) -> (config files) -> (service) -> (optional restart)
```

On first startup, this will start and them immediately restart
supervisor, which is unfortunate but unavoidable -- and not terribly
relevant, since the database will not have been created yet, and thus
most processes will be in a restart loop for failing to connect to it.
2021-07-22 14:23:41 -07:00
Alex Vandiver
9824a9d7cf puppet: Work around sysvinit supervisor init bug.
The sysvinit script for supervisor has a long-standing bug where
`/etc/init.d/supervisor restart` stops but does not then start the
supervisor process.

Work around this by making restart then try to start, and return if it
is currently running.
2021-07-22 14:23:41 -07:00