Commit Graph

1554 Commits

Author SHA1 Message Date
Alex Vandiver
ca971ebc59 puppet: Remove empty zulip_ops class. 2020-10-22 11:30:53 -07:00
Alex Vandiver
16af05758d puppet: Move zulip_org into zulip_ops.
This class is not of general interest.
2020-10-22 11:30:53 -07:00
Alex Vandiver
ad566c491d puppet: Drop now-unused zulip_ops:::git class. 2020-10-22 11:30:53 -07:00
Alex Vandiver
50e9e2ed20 puppet: Make zulip::base include zulip::apt_repository.
There was likely more dependency complexity prior to 97766102df, but
there is now no reason to require that consumers explicitly include
zulip::apt_repository.
2020-10-22 11:30:53 -07:00
Alex Vandiver
2dc6d26ec6 puppet: Fix included monitoring class name. 2020-10-19 22:30:20 -07:00
Alex Vandiver
7a1132d605 puppet: Switch golang and smokescreen to use /srv.
/srv and /opt have very similar usages; but we should be internally
consistent.

Move these two (the only usages of /opt) to match the rest in /srv.
2020-10-16 13:00:06 -07:00
Alex Vandiver
78b92a51cc puppet: Allow access to smokescreen port via iptables. 2020-10-15 15:18:35 -07:00
Alex Vandiver
0d5356969e puppet: Reformat ipv4 iptables rules comments. 2020-10-15 15:18:35 -07:00
Alex Vandiver
fffea9612b puppet: Add an outgoing HTTP/HTTPS proxy server.
Use https://github.com/stripe/smokescreen to provide a server for an
outgoing proxy, run under supervisor.  This will allow centralized
blocking of internal metadata IPs, localhost, and so forth, as well as
providing default request timeouts (10s by default).
2020-10-15 15:18:35 -07:00
Anders Kaseorg
dfaea9df65 shfmt: Reformat shell scripts with shfmt.
https://github.com/mvdan/sh

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-15 15:16:00 -07:00
Alex Vandiver
f61ac4a28d puppet: Move frontend monitoring into its own file.
This allows it to be pulled in for deploys like czo, which don't use
the full `zulip_ops::app_frontend`, but we wish to monitor.
2020-10-13 17:37:32 -07:00
Tim Abbott
7c2c82b190 nginx: Update nginx configuration for fhir/hl7 organization.
We should eventually add templating for the set of hosts here, but
it's worth merging this change to remove the deleted hostname and
replace it with the current one.
2020-10-13 16:50:26 -07:00
Anders Kaseorg
723d285e46 nginx: Redirect {www.,}zulipchat.com, www.zulip.com to zulip.com.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-13 16:49:23 -07:00
Alex Vandiver
c8df9a150e puppet: Drop all log2zulip configuration.
Disabled on webservers in 047817b6b0, it has since lingered in
configuration, as well as running (to no effect) every minute on the
loadbalancer.

Remove the vestiges of its configuration.
2020-10-13 11:00:50 -07:00
Alex Vandiver
b431b1b021 puppet: Remove misleading motd.
This banner shows on lb1, advertising itself as lb0.  There is no
compelling reason for a custom motd, especially one which needs to
be reconfigured for each host.
2020-10-13 11:00:36 -07:00
Alex Vandiver
45c9c3cc30 queue: Monitor user_activity queue, now that it has a consumer.
Since this was using repead individual get() calls previously, it
could not be monitored for having a consumer.  Add it in, by marking
it of queue type "consumer" (the default), and adding Nagios lines for
it.

Also adjust missedmessage_emails to be monitored; it stopped using
LoopQueueProcessingWorker in 5cec566cb9, but was never added back
into the set of monitored consumers.
2020-10-11 14:19:42 -07:00
Alex Vandiver
4fd7df4e8c puppet: Remove absent of check-apns-tokens.
This was marked as ensure absent in d02101a401, in v1.7.0 in 2017.
2020-09-29 18:17:08 -07:00
Alex Vandiver
872a349508 puppet: Remove absent of log2zulip.
This was marked as ensure absent in 047817b6b0, in v2.0.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver
0137772fdb puppet: Remove absent of calculate-first-visible-message-id.
This was marked as ensure absent in dc7d44a245, in v1.9.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver
966c8dc23d puppet: Remove absent of email-mirror cron job.
This was marked as ensure absent in 24f8492236, in v1.3.0 in 2014.
2020-09-29 18:17:08 -07:00
Alex Vandiver
430d3b8554 puppet: Remove absent of libapache2-mod-wsgi.
This was marked as ensure absent in 89b97e7480, in v1.7.0 in 2017,
though it did not take effect until 6e55aa2ce6, in v1.9.0 in 2018.
2020-09-29 18:17:08 -07:00
Alex Vandiver
12085552d5 puppet: Tidy indentation. 2020-09-29 17:44:44 -07:00
Alex Vandiver
57d88eedd8 puppet: Only install rabbitmq cron jobs via zulip_ops.
The rabbitmq cron jobs exist in order to call rabbitmqctl as root and
write the output to files that nagios can consume, since nagios is not
allowed to run rabbitmqctl.

In systems which do not have nagios configured, these every-minute
cron jobs add non-insignificant load, to no effect.  Move their
installation into `zulip_ops`.  In doing so, also combine the cron.d
files into a single file; this allows us to `ensure => absent` the old
filenames, removing them from existing systems.  Leave the resulting
combined cron.d file in `zulip`, since it is still of general utility
and note.
2020-09-29 17:44:44 -07:00
Alex Vandiver
79931051bd puppet: Permit outgoing mail from postfix.
The configuration change made in 1c17583ad5 only allowed delivery to
those specific Zulip addresses.  However, they also prevent the
mailserver from being used as an outgoing email relay from Zulip,
since all mail that passed through the mailserver (from any
originator) was required to have a `RCPT TO` that matched those
regexes.

Allow mail originating from `mynetworks` to have an arbitrary
addresses in `RCPT TO`.
2020-09-25 15:09:27 -07:00
Alex Vandiver
36ea307fbf puppet: Depend other changes on sharding.py validation.
Use the validation of the tornado sharding config that
`stage_updated_sharding` does, by depending on it.  This ensures that
we don't write out a supervisor or nginx config based on a
bad (e.g. non-sequential) list of tornado ports.
2020-09-25 10:52:40 -07:00
Alex Vandiver
c0e240277b tornado: Remove fingerprinting, write out .tmp files always.
Fingerprinting the config is somewhat brittle -- it requires either
custom bootstrapping for old (fingerprint-less) configs, and may have
false-positives.

Since generating the config is lightweight, do so into the .tmp files,
and compare the output to the originals to determine if there are
changes to apply.

In order to both surface errors, as well as notify the user in case a
restart is necessary, we must run it twice.  The `onlyif`
functionality cannot show configuration errors to the user, only
determine if the command runs or not.  We thus run the command once,
judging errors as "interesting" enough to run the actual command,
whose failure will be verbose in Puppet and halt any steps that depend
on it.

Removing the `onlyif` would result in `stage_updated_sharding` showing
up in the output of every Puppet run, which obscures the important
messages it displays when an update to sharding is necessary.
Removing the `command` (e.g. making it an `echo`) would result in
removing the ability to report configuration errors.  We thus have no
choice but to run it twice; this is thankfully low-overhead.
2020-09-25 10:52:40 -07:00
Alex Vandiver
2a12fedcf1 tornado: Remove explicit tornado_processes setting; compute it.
We can compute the intended number of processes from the sharding
configuration.  In doing so, also validate that all of the ports are
contiguous.

This removes a discrepancy between `scripts/lib/sharding.py` and other
parts of the codebase about if merely having a `[tornado_sharding]`
section is sufficient to enable sharding.  Having behaviour which
changes merely based on if an empty section exists is surprising.

This does require that a (presumably empty) `9800` configuration line
exist, but making that default explicit is useful.

After this commit, configuring sharding can be done by adding to
`zulip.conf`:

```
[tornado_sharding]
9800 =              # default
9801 = other_realm
```

Followed by running `./scripts/refresh-sharding-and-restart`.
2020-09-18 15:13:40 -07:00
Alex Vandiver
f638518722 tornado: Move default production port to 9800.
In development and test, we keep the Tornado port at 9993 and 9983,
respectively; this allows tests to run while a dev instance is
running.

In production, moving to port 9800 consistently removes an odd edge
case, when just one worker is on an entirely different port than if
two workers are used.
2020-09-18 15:13:40 -07:00
Alex Vandiver
ff94254598 tornado: Log to files by port number.
Without an explicit port number, the `stdout_logfile` values for each
port are identical.  Supervisor apparently decides that it will
de-conflict this by appending an arbitrary number to the end:

```
/var/log/zulip/tornado.log
/var/log/zulip/tornado.log.1
/var/log/zulip/tornado.log.10
/var/log/zulip/tornado.log.2
/var/log/zulip/tornado.log.3
/var/log/zulip/tornado.log.7
/var/log/zulip/tornado.log.8
/var/log/zulip/tornado.log.9
```

This is quite confusing, since most other files in `/var/log/zulip/`
use `.1` to mean logrotate was used.  Also note that these are not all
sequential -- 4, 5, and 6 are mysteriously missing, though they were
used in previous restarts.  This can make it extremely hard to debug
logs from a particular Tornado shard.

Give the logfiles a consistent name, and set them up to logrotate.
2020-09-14 22:17:51 -07:00
Alex Vandiver
efdaa58c24 supervisor: Use more specific process_name than "port-9800".
Making this include "zulip-tornado" makes it clearer in supervisor
logs.  Without this, one only sees:
```
2020-09-14 03:43:13,788 INFO waiting for port-9807 to stop
2020-09-14 03:43:14,466 INFO stopped: port-9807 (exit status 1)
2020-09-14 03:43:14,469 INFO spawned: 'port-9807' with pid 24289
2020-09-14 03:43:15,470 INFO success: port-9807 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
```
2020-09-14 22:17:51 -07:00
Alex Vandiver
e9d0bdea65 puppet: Coerce uwsgi_listen_backlog_limit into an int before doing math. 2020-09-14 21:22:13 -07:00
Alex Vandiver
8adf530400 puppet: Generate sharding in puppet, then refresh-sharding-and-restart.
This supports running puppet to pick up new sharding changes, which
will warn of the need to finalize them via
`refresh-sharding-and-restart`, or simply running that directly.
2020-09-14 16:27:15 -07:00
Alex Vandiver
0de356c2df puppet: Move generation of tornado nginx upstreams into tornado_sharding.
This puts the creation of the upstreams referenced by
`nginx_sharding.conf` adjacent to their use.
2020-09-14 16:27:15 -07:00
Alex Vandiver
bf029d99f1 sharding: Also mark sharding.json 644 for consistency.
There is no reason to limit this to 640; mark it 644 for consistency
with the other file.
2020-09-14 16:27:15 -07:00
Alex Vandiver
1c17583ad5 puppet: Restrict postfix incoming addresses to postmaster and zulip.
This removes the possibility of local user enumeration via RCPT TO.
2020-09-11 18:49:22 -07:00
Alex Vandiver
482c964dd3 puppet: Logrotate for webhook exceptions. 2020-09-10 17:47:21 -07:00
Alex Vandiver
e38051736d puppet: Wrap and sort logrotate config. 2020-09-10 17:47:21 -07:00
Anders Kaseorg
75c59a820d python: Convert subprocess.Popen.communicate to run or check_output.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 17:42:35 -07:00
Anders Kaseorg
fbfd4b399d python: Elide action="store" for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg
1f2ac1962f python: Elide default=None for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 16:17:14 -07:00
Anders Kaseorg
d751e0cece puppet: Don’t install netcat.
It’s been unused since commit 0af22dad18
(#13239).

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-03 10:33:47 -07:00
Anders Kaseorg
ab120a03bc python: Replace unnecessary intermediate lists with generators.
Mostly suggested by the flake8-comprehension plugin.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:15:41 -07:00
Anders Kaseorg
a5dbab8fb0 python: Remove redundant dest for argparse arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-09-02 11:04:10 -07:00
Anders Kaseorg
dbdf67301b memcached: Switch from pylibmc to python-binary-memcached.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-08-06 12:51:14 -07:00
Casper Kvan Clausen
ed7a6d5e4d puppet: Support nginx_listen_port with http_only 2020-08-03 12:58:12 -07:00
Alex Vandiver
cd530d627b uwsgi: Stop generating IOError and SIGPIPE on client close.
Clients that close their socket to nginx suddenly also cause nginx to close
its connection to uwsgi.  When uwsgi finishes computing the response,
it thus tries to write to a closed socket, and generates either
IOError or SIGPIPE failures.

Since these are caused by the _client_ closing the connection
suddenly, they are not actionable by the server.  At particularly high
volumes, this could represent some sort of server-side failure;
however, this is better detected by examining status codes at the
loadbalancer.  nginx uses the error code 499 for this occurrence:
https://httpstatuses.com/499

Stop uwsgi from generating this family of exception entirely, using
configuration for uwsgi[1]; it documents these errors as "(annoying),"
hinting at their general utility."

[1] https://uwsgi-docs.readthedocs.io/en/latest/Options.html#ignore-sigpipe
2020-07-31 10:40:09 -07:00
Alex Vandiver
ceb909dbc5 puppet: Increase backlogged socket count based on uwsgi backlog.
Increasing the uwsgi listen backlog is intended to allow it to handle
higher connection rates during server restart, when many clients may
be trying to connect.  The kernel, in turn, needs to have a
proportionally increased somaxconn soas to not refuse the connection.

Set somaxconn to 2x the uwsgi backlog, but no lower than the
default (128).
2020-07-28 21:16:26 -07:00
Alex Vandiver
38d01cd4db puppet: Generalize install-wal-g to be arbitrary tarballs. 2020-07-24 17:24:57 -07:00
Tim Abbott
5a1243db3c puppet: Use correct scope for zulip_ops::munin_plugin. 2020-07-15 21:49:45 -07:00
Alex Vandiver
48c3c33d10 puppet: Fully-qualify the munin-plugin name 2020-07-14 17:58:51 -07:00