Commit Graph

429 Commits

Author SHA1 Message Date
Alex Vandiver
4c17f2bccc nagios: The frontends hostgroup now includes prod and staging frontends.
This lets the config file remove some repetition.
2022-06-22 12:07:38 -07:00
Alex Vandiver
1e81775fa0 nagios: Drop unhelpful hostgroup comment. 2022-06-22 12:07:38 -07:00
Alex Vandiver
7b584401ac nagios: Reformat hostgroups. 2022-06-22 12:07:38 -07:00
Alex Vandiver
93bcb86345 nagios: Reorder service checks. 2022-06-22 12:07:38 -07:00
Alex Vandiver
eaaa2fbff8 nagios: Use canonical "hostgroup_name" consistently. 2022-06-22 12:07:38 -07:00
Alex Vandiver
e8996b53a5 nagios: Remove unused has_swap hostgroup. 2022-06-22 12:07:38 -07:00
Alex Vandiver
33472ee9ff nagios: Remove unused stats host set. 2022-06-22 12:07:38 -07:00
Alex Vandiver
bc4f4b4862 nagios: Make the pageable/not/flaky tri-state clearer. 2022-06-22 12:07:38 -07:00
Alex Vandiver
c74f195fba nagios: Split AWS and non-AWS hosts, for ntp checks.
The non-AWS hosts cannot use the AWS ntp server for their check.
2022-06-22 12:07:38 -07:00
Alex Vandiver
872efdee58 nagios: Fold single- and multitornado_frontends back into frontends.
5abf4dee92 made this distinction, then multitornado_frontends was
never used; the singletornado_frontends alerting worked even for the
multiple-Tornado instances.

Remove the useless and misleading distinction.
2022-06-22 12:07:38 -07:00
Alex Vandiver
3741c1c034 puppet: Switch to checking time against the AWS timeserver.
Since this is what chrony is sync'ing to, it lessens the chance of
spurious firings of this alert.

See https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
2022-05-31 22:57:32 -07:00
Alex Vandiver
7f6a77da31 puppet: Add a redis exporter. 2022-05-03 17:13:44 -07:00
Anders Kaseorg
e9ba9b0e0d zulip-ec2-configure-interfaces: Remove.
Our current EC2 systems don’t have an interface named ‘eth0’, and if
they did, this script would do nothing but crash with ImportError
because we have never installed boto.utils for Python 3.

(The message of commit 2a4d851a7c made
an effort to document for future researchers why this script should
not have been blindly converted to Python 3.  However, commit
2dc6d09c2a (#14278) was evidently
unresearched and untested.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-05-03 02:25:59 -07:00
Anders Kaseorg
646a4d19a3 puppet: Remove quotes for enumerable values.
https://puppet.com/docs/puppet/7/style_guide.html#style_guide_module_design-quoting
“If a string is a value from an enumerable set of options, such as
present and absent, it SHOULD NOT be enclosed in quotes at all.”

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-29 22:06:46 -07:00
Alex Vandiver
35db1ee435 puppet: Only include "app_service" section if there are apps.
This works around gravitational/teleport#12256, but also produces config
files that are slightly cleaner.
2022-04-26 16:36:13 -07:00
Alex Vandiver
f6d27562fa puppet: Configure chrony to use AWS-local NTP sources.
This prevents hosts from spewing traffic to random hosts across the
Internet.
2022-03-25 17:07:53 -07:00
Alex Vandiver
1bd5723cd2 puppet: Add a prometheus monitor for tornado processes. 2022-03-20 16:12:11 -07:00
Alex Vandiver
6b91652d9a puppet: Open the grok_exporter port.
The complete grok_exporter configuration is not ready to be committed,
but this at least prepares the way for it.
2022-03-20 16:12:11 -07:00
Alex Vandiver
6558655fc6 puppet: Add rabbitmq prometheus plugin, and open the firewall. 2022-03-20 16:12:11 -07:00
Alex Vandiver
bdd2f35d05 puppet: Switch czo to using zulip_ops::app_frontend_monitoring.
This was clearly intended in f61ac4a28d
but never executed.
2022-03-20 16:12:11 -07:00
Alex Vandiver
17699bea44 puppet: postgresql_backups is auto-included if s3_backups_bucket is set.
Since 6496d43148.
2022-03-20 16:12:11 -07:00
Alex Vandiver
bedc7c2986 puppet: Smokescreen is now auto-included in standalone.
Since c33562f0a8.
2022-03-20 16:12:11 -07:00
Anders Kaseorg
b3260bd610 docs: Use Debian and Ubuntu version numbers over development codenames.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 12:04:24 -08:00
Alex Vandiver
788daa953b puppet: Factor out $::architecture case statement for golang. 2022-02-15 12:04:37 -08:00
Anders Kaseorg
f6a701090c setup-apt-repos: Don’t install lsb_release.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-14 16:38:53 -08:00
Alex Vandiver
e032b38661 puppet: Fix typo in uwsgi exporter dependency. 2022-02-08 15:17:17 -08:00
Alex Vandiver
3bbe5c1110 puppet: Put comments on iptables lines.
In addition to documenting the rules.v4 and rules.v6 files slightly,
these comments show up in `iptables -L`:

```
root@hostname:~# iptables -L INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
LOGDROP    all  --  anywhere             localhost/8
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh /* ssh */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:3000 /* grafana */
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:9100 /* node_exporter */
LOGDROP    all  --  anywhere             anywhere
```
2022-01-21 16:46:14 -08:00
Alex Vandiver
6bc5849ea8 puppet: Remove now-unused debathena apt repository. 2022-01-18 14:13:28 -08:00
Alex Vandiver
b3f07cc98d puppet: Replace debathena zephyr package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver
a6d7539571 puppet: Replace debathena krb5 package with equivalent puppet file. 2022-01-18 14:13:28 -08:00
Alex Vandiver
75224ea5de puppet: python-dev is now purely virtual; install python2.7-dev. 2022-01-18 14:13:28 -08:00
Alex Vandiver
fc1adef28a puppet: Fix server_name of internal staging server. 2022-01-18 12:36:56 -08:00
Alex Vandiver
7e630b81f8 puppet: Switch to using snakeoil certs for staging.
This parallels ba3b88c81b, but for the
staging host.
2022-01-18 12:36:56 -08:00
Alex Vandiver
0b8a6a51b8 puppet: Remove all parts of AWS kernels.
Otherwise, we just uninstall the meta-package, and still restart into
the installed AWS kernel.
2022-01-12 15:52:19 -08:00
Alex Vandiver
4d7e6b26df puppet: Provide more attributes to teleport on ssh nodes. 2022-01-12 14:15:45 -08:00
Alex Vandiver
339e70671c puppet: Switch Grafana to Grafana 8 Unified Alerting. 2022-01-11 14:27:11 -08:00
Alex Vandiver
6a7eecee9a puppet: Increase load paging thresholds. 2022-01-11 09:38:31 -08:00
Alex Vandiver
1e80b844f4 puppet: Disable apparmor profile for msmtp.
As the nagios user, we want to read the msmtp configuration from
~nagios, which apparmor's profile does not allow msmtp to do.
2022-01-11 09:38:31 -08:00
Alex Vandiver
3c95ad82c6 puppet: Upgrade to nagios4.
This updates the puppeted nagios configuration file for the Nagios4
defaults.
2022-01-11 09:38:31 -08:00
Alex Vandiver
4a95967a33 puppet: Gather uwsgi stats from chat.zulip.org. 2022-01-03 21:26:57 -08:00
Alex Vandiver
8a5be972d2 puppet: Add a uwsgi exporter for monitoring.
This allows investigation of how many workers are busy, and to track
"harikari" terminations.
2022-01-03 15:25:58 -08:00
Anders Kaseorg
82748d45d8 install-yarn: Use test -ef in case /srv is a symlink.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-12-30 13:42:07 -08:00
Alex Vandiver
c094867a74 puppet: Add aarch64 build hashes to external dependencies.
wal-g does not ship aarch64 binaries, currently; the compilation
process([1]) is somewhat complicated, so we defer the decision about
how to support wal-g for aarch64 until a later date.

[1]: https://github.com/wal-g/wal-g/blob/master/docs/PostgreSQL.md#installing
2021-12-29 16:35:15 -08:00
Alex Vandiver
f166f9f7d6 puppet: Centralize versions and sha256 hashes of external dependencies.
This will make it easier to update versions of these dependencies.
2021-12-29 16:35:15 -08:00
Alex Vandiver
57662689a9 puppet: Provide a constant homedir for grafana user.
The homedir of a user cannot be changed if any processes are running
as them, so having it change over time as upgrades happen will break
puppet application, as the old grafana process under supervisor will
effectively lock changes to the user's homedir.

Unfortunately, that means that this change will thus fail to
puppet-apply unless `supervisorctl stop grafana` is run first, but
there's no way around that.
2021-12-29 16:35:15 -08:00
Alex Vandiver
6e55e52694 puppet: Pull out grafana $data_dir. 2021-12-29 16:35:15 -08:00
Alex Vandiver
1e4e6a09af puppet: Stop making resources for external binaries and directories.
In the event that extracting doesn't produce the binary we expected it
to, all this will do is create an _empty_ file where we expect the
binary to be.  This will likely muddle debugging.

Since the only reason the resourfce was made in the first place was to
make dependencies clear, switch to depending on the External_Dep
itself, when such a dependency is needed.
2021-12-29 16:35:15 -08:00
Alex Vandiver
3c163a7d5e puppet: Move slash out of $dir by convention. 2021-12-29 16:35:15 -08:00
Alex Vandiver
bb5a2c8138 puppet: Move prometheus to external_dep. 2021-12-29 16:35:15 -08:00
Alex Vandiver
2d6c096904 puppet: Move node_exporter to external_dep. 2021-12-29 16:35:15 -08:00