477 Commits

Author SHA1 Message Date
Alex Vandiver
0d5356969e puppet: Reformat ipv4 iptables rules comments. 2020-10-15 15:18:35 -07:00
Alex Vandiver
fffea9612b puppet: Add an outgoing HTTP/HTTPS proxy server.
Use https://github.com/stripe/smokescreen to provide a server for an
outgoing proxy, run under supervisor.  This will allow centralized
blocking of internal metadata IPs, localhost, and so forth, as well as
providing default request timeouts (10s by default).
2020-10-15 15:18:35 -07:00
Anders Kaseorg
dfaea9df65 shfmt: Reformat shell scripts with shfmt.
https://github.com/mvdan/sh

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-15 15:16:00 -07:00
Alex Vandiver
f61ac4a28d puppet: Move frontend monitoring into its own file.
This allows it to be pulled in for deploys like czo, which don't use
the full `zulip_ops::app_frontend`, but we wish to monitor.
2020-10-13 17:37:32 -07:00
Tim Abbott
7c2c82b190 nginx: Update nginx configuration for fhir/hl7 organization.
We should eventually add templating for the set of hosts here, but
it's worth merging this change to remove the deleted hostname and
replace it with the current one.
2020-10-13 16:50:26 -07:00
Anders Kaseorg
723d285e46 nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 16:49:23 -07:00
Alex Vandiver
c8df9a150e puppet: Drop all log2zulip configuration.
Disabled on webservers in 047817b6b0, it has since lingered in
configuration, as well as running (to no effect) every minute on the
loadbalancer.

Remove the vestiges of its configuration.
2020-10-13 11:00:50 -07:00
Alex Vandiver
b431b1b021 puppet: Remove misleading motd.
This banner shows on lb1, advertising itself as lb0.  There is no
compelling reason for a custom motd, especially one which needs to
be reconfigured for each host.
2020-10-13 11:00:36 -07:00
Alex Vandiver
45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver
4fd7df4e8c puppet: Remove absent of check-apns-tokens.
This was marked as ensure absent in d02101a401, in v1.7.0 in 2017.
2020-09-29 18:17:08 -07:00
Alex Vandiver
872a349508 puppet: Remove absent of log2zulip.
This was marked as ensure absent in 047817b6b0, in v2.0.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver
57d88eedd8 puppet: Only install rabbitmq cron jobs via zulip_ops.
The rabbitmq cron jobs exist in order to call rabbitmqctl as root and
write the output to files that nagios can consume, since nagios is not
allowed to run rabbitmqctl.

In systems which do not have nagios configured, these every-minute
cron jobs add non-insignificant load, to no effect.  Move their
installation into `zulip_ops`.  In doing so, also combine the cron.d
files into a single file; this allows us to `ensure => absent` the old
filenames, removing them from existing systems.  Leave the resulting
combined cron.d file in `zulip`, since it is still of general utility
and note.
2020-09-29 17:44:44 -07:00
Anders Kaseorg
ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Tim Abbott
5a1243db3c puppet: Use correct scope for zulip_ops::munin_plugin. 2020-07-15 21:49:45 -07:00
Alex Vandiver
48c3c33d10 puppet: Fully-qualify the munin-plugin name 2020-07-14 17:58:51 -07:00
Alex Vandiver
31d80a77d4 puppet: Update nagios check_postgres_replication_lag to be on DB hosts
7d4a370a57 attempted to move the replication check to on the
PostgreSQL hosts.  While it updated the _check_ to assume it was
running and talking to a local PostgreSQL instance, the configuration
and installation for the check were not updated.  As such, the check
ran on the nagios host for each DB host, and produced no output.

Start distributing the check to all apopdb hosts, and configure nagios
to use the SSH tunnel to get there.
2020-07-14 16:27:18 -07:00
Alex Vandiver
6c27f07c1d puppet: Move PostgreSQL backups to their own class.
wal-g was used in `puppet/zulip` by env-wal-g, but only installed in
`puppet/zulip_ops`.

Merge all of the dependencies of doing backups using wal-g (wal-g
installation, the pg_backup_and_purge job, the nagios plugin that
verifies it happens) into a common base class in `puppet/zulip`, since
it is generally useful.
2020-07-14 00:40:25 -07:00
Anders Kaseorg
15483c09cb puppet: Add missing trailing commas.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-13 15:36:06 -07:00
Alex Vandiver
3691a94efe puppet: Configure munin and nagios under apache with puppet.
This swaps in the actually-in-use munin configuiration file;
otherwise, it is an implementation of the configuration as it exists
on the machine.
2020-07-13 13:23:11 -07:00
Alex Vandiver
4e42164b4a munin: Add plugins to prod hosts. 2020-07-13 13:23:11 -07:00
Alex Vandiver
2a14212b27 munin: Add a helper resource definition for munin plugins. 2020-07-13 12:49:28 -07:00
Alex Vandiver
7c7b5fcd6f munin: Deal with spaces in the channel names. 2020-07-13 12:49:28 -07:00
Alex Vandiver
eda2c4b8e2 puppet: Split munin-node from munin-server.
No plugins are installed inside the /usr/local/munin/lib this creates
in munin-node, nor are they symlinked into /etc/munin/plugins, so
non-default plugins are added by this.
2020-07-13 12:49:28 -07:00
Alex Vandiver
ddc7bb5a45 munin: Fix the path to check_send_receive_time. 2020-07-13 12:49:28 -07:00
Alex Vandiver
8be544e7eb munin: Rename monitoring plugin to use zulip name, not humbug. 2020-07-13 12:49:28 -07:00
Alex Vandiver
8cff27f67d puppet: Pull hosts from zulip.conf, not hardcoded list.
The one complexity is that hosts_fullstack are treated differently, as
they are not currently found in the manual `hosts` list, and as such
do not get munin monitoring.
2020-07-10 00:14:09 -07:00
Alex Vandiver
24383a5082 puppet: Rename hosts_domain so hosts_prefix can be grepped for. 2020-07-10 00:14:09 -07:00
Alex Vandiver
a4e7c7a27e nagios: Remove check_memcached.
check_memcached does not support memcached authentication even in its
latest release (it’s in a TODO item comment, and that’s it), and was
never particularly useful.
2020-07-10 00:12:48 -07:00
Anders Kaseorg
9900298315 zthumbor: Remove Python 2 residue.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 18:44:58 -07:00
Alex Vandiver
a21a086f5c puppet: nagios-plugins-basic is replaced by monitoring-plugins-basic.
In Bionic, nagios-plugins-basic is a transitional package which
depends on monitoring-plugins-basic.  In Focal, it is a virtual
package, which means that every time puppet runs, it tries to
re-install the nagios-plugins-basic package.

Switch all instances to referring to `$zulip::common::nagios_plugins`,
and repoint that to monitoring-plugins-basic.
2020-06-29 14:58:01 -07:00
Alex Vandiver
876ee4a8ed installer: Remove code specific to stretch or xenial.
Support for Xenial and Stretch was removed (5154ddafca, 0f4b1076ad,
8944e0ad53, 79acd5ae40, 1219a2e854), but not all codepaths were
updated to remove their conditionals on it.

Remove all code predicated on Xenial or Stretch.  debathena support
was migrated to Bionic, since that appears to be the current state of
existing debathena servers.
2020-06-24 12:57:38 -07:00
Alex Vandiver
7250d41bf7 puppet: Fix the path to install-wall-g 2020-06-17 15:23:18 -07:00
Tim Abbott
26396c5e25 puppet: Fix exceptions with multiple certbot declarations.
Since 9e8f1aacb3, zulip_ops machines
might have two Package declarations for `certbot`, which doesn't work
in puppet.

The fix is, as usual, to use our `zulip::safepackage` wrapper instead.
2020-06-15 18:21:33 -07:00
Alex Vandiver
f8fc3a16eb puppet: Use "primary" / "replica" consistently in comments.
The style guide for Zulip is to always use "primary" and "replica"
when describing database replication.  Adjust a few comments under
`puppet/` that do not adhere to this.

Unfortunately, some references still remain to the insensitive and
inaccurate "master" / "slave" terminology.  However, these are only in
files which we are attempting to preserve as close to the upstream
versions they are derived from (e.g. postgresql.conf,
postfix/master.cf).
2020-06-15 16:18:07 -07:00
Alex Vandiver
5f433d6eeb puppet: Remove vestigial check_postgres.pl.
65774e1c4f switched from using the bundled check_postgres.pl to using
the version from packages; the file itself remained, however.

Remove it, and clean up references to it.

Fixes #15389.
2020-06-15 16:18:07 -07:00
Alex Vandiver
7d4a370a57 puppet: Move monitoring of pg replication to the pg hosts.
Instead of SSH'ing around to them, run directly on the database hosts.
This means that the replicas do not know how many bytes behind they
are in _receiving_ the wall logs; thus, the monitoring also extends to
the primary database, which knows that information for each replica.
This also allows for detecting when there are too few active replicas.
2020-06-15 16:18:07 -07:00
Anders Kaseorg
74c17bf94a python: Convert more percent formatting to Python 3.6 f-strings.
Generated by pyupgrade --py36-plus.

Now including %d, %i, %u, and multi-line strings.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-14 23:27:22 -07:00
Anders Kaseorg
91a86c24f5 python: Replace None defaults with empty collections where appropriate.
Use read-only types (List ↦ Sequence, Dict ↦ Mapping, Set ↦
AbstractSet) to guard against accidental mutation of the default
value.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-13 15:31:27 -07:00
Alex Vandiver
97b9308781 puppet: Merge multiple postgres roles in zulip_ops.
All differences between the primary and replica roles having been
merged, fold the `postgres_common`, `postgres_master`, and
`postgres_slave` roles into just `postgres_appdb`.
2020-06-12 14:57:46 -07:00
Alex Vandiver
55bd31721d puppet: Remove custom vm.dirty_ratio and vm.dirty_background_ratio.
These values differed between the primary and secondary database
hosts, for unclear reasons.  The differences date back to their
introduction in 387f63deaa.  As the comment in the replica
confguration notes, settings of `vm.dirty_ratio = 10` and
`vm.dirty_background_ratio = 5` matched the kernel defaults for
"newer" kernels; however, kernel 2.6.30 bumped those to 20 and 10,
respectively[1], as a fix for underlying logic now being more correct.

Remove these overrides; they should at very least be consistent across
roles, and the previous values look to be an attempt to tune for a
very much older version of the Linux kernel, which was using an
different, buggier, algorithm under the hood.

[1] 1b5e62b42b
2020-06-12 14:57:46 -07:00
Alex Vandiver
f39816e768 puppet: Stop distributing recovery.conf file.
This file controls streaming replication, and recovery using wal-g on
the secondary.  The `primary_conninfo` data needs to change on short
notice when database failover happens, in a way that is not suitable
for being controlled by puppet.

PostgreSQL 12, in fact, removes the use of the `recovery.conf` file[1];
the `primary_conninfo` and `restore_command` information goes into the
main `postgresql.conf` file, and the standby status is controlled by
the presence of absence of an empty `standby.signal` file.

Remove the puppet control of the `recovery.conf` file.

[1] https://pgstef.github.io/2018/11/26/postgresql12_preview_recovery_conf_disappears.html
2020-06-12 14:57:46 -07:00
Alex Vandiver
316498a169 puppet: Remove unnecessary nagios authentication setup.
Since the nagios authentication is stored _in the database_, it is
unnecessary to run if the database is simply a replica of the
production database.  The only case in which this statement would have
an effect is if the postgres node contains a _different_ (or empty)
database, which `setup_disks` now effectively prevents.

Remove the unnecessary step.
2020-06-11 21:01:49 -07:00
Alex Vandiver
0774f54c1b puppet: Move to setup_disks to postgres_common.
The tooling should now be run no matter if the node is a primary or
replica.
2020-06-11 21:01:49 -07:00
Alex Vandiver
6f6a0e890a puppet: Run setup_disks based on symlink; remove mdadm dependency.
481613a344 updated the `setup_disks` script to no longer reference
`mdadm`, since we no longer set up RAID on servers.

Update the puppet that would call it to remove the `mdadm` dependency,
and run only if the state is not what it produces -- namely, a symlink
for `/var/lib/postgresql`, which must point to an existent
`/srv/postgresql` directory.
2020-06-11 21:01:49 -07:00
Alex Vandiver
1dc2de5026 puppet: Update setup-disks to be idempotent.
The end state it produces is _either_:

 - `/srv/postgresql` already existed, which was symlinked into
   `/var/lib/postgresql`; postgres is left untouched.  This is the
   situation if `setup_disks` is run on the database primary, or a
   replica which was correctly configured.

 - An empty `/srv/postgresql` now exists, symlinked into
   `/var/lib/postgresql`, and postgres is stopped.  This is the
   situation if `puppet` was just run on a new host, or a
   previously-configured host was rebooted (clearing the temporary
   disk in `/dev/nvme0`)

In the latter case, where `/srv/postgresql` is now empty, any previous
contents of `/var/lib/postgresql` are placed under `/root`,
timestamped for uniqueness.

In either case, the tool should now be idempotent.
2020-06-11 21:01:49 -07:00
Alex Vandiver
16c4cea951 puppet: Pull postgres config directory into postgres_appdb_base.
As the previous commit, this is currently only used in tuning, but is
a property of the whole postgres configuration; move it there, as just
the directory, not the file.

Use this directory consistently in the erb templates.  Since we
produce a `pg_hba.conf`, it makes sense that we point to the path that we
know that we explicitly wrote to, for instance.
2020-06-11 20:56:55 -07:00
Anders Kaseorg
365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg
69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Alex Vandiver
b114eb2f10 puppet: Rename env-wal-e to env-wal-g.
It runs wal-g now, not wal-e; make its name respect that.
2020-06-11 15:52:43 -07:00
Alex Vandiver
4fe0444108 puppet: Install wal-g, not wal-e. 2020-06-11 15:52:43 -07:00