Commit Graph

657 Commits

Author SHA1 Message Date
Tim Abbott
d98be2f19f puppet: Only run analytics Nagios checks on machine running cron.
Running this on additional machines would be redundant; additionally,
the FillState checker cron job runs only on cron systems, so this will
crash on other app frontends.
2018-03-06 13:38:27 -08:00
Tim Abbott
8e8faab006 puppet: Move clearsessions cron job to app_frontend_once.
While this is a different system than I'd written up in #8004, I think
this is a better solution to the general problem of cron jobs to run
on just one server.

Fixes #8004.
2018-03-06 13:35:51 -08:00
Tim Abbott
9a74ef5056 puppet: Move some cron jobs to app_frontend_once.pp.
Several cron jobs had incorrectly ended up in the app_frontend.pp
template, and thus would only run on voyager instances.
2018-03-06 13:35:51 -08:00
Tim Abbott
3ae645ed12 puppet: Rename analytics.pp to app_frontend_once.pp. 2018-03-06 13:35:51 -08:00
Tim Abbott
ad7f38ab3b puppet: Move analytics cron job to analytics.pp.
This better groups it with the related code.
2018-03-06 13:35:51 -08:00
Tim Abbott
24b6106c9c puppet: Dsiable checking for evictions in memcached nagios.
Zulip's caching model for message history is such that it is normal
and healthy for there to eventually be a nontrivial volume of
evictions.
2018-03-06 13:34:02 -08:00
Greg Price
4475950ddf queue: Restore prematurely-cut upgrade path.
Revert c8f034e9a "queue: Remove missedmessage_email_senders code."
As the comment in the code says, it ensures a smooth upgrade path
from 1.7.x; we can delete it in master after 1.8.0 is released.
The removal commit was merged early due to a communication failure.
2018-02-28 11:15:53 -08:00
Tim Abbott
65767e5226 localhost_sso: Fix missing enabling of mod_wsgi.
This is apparently required on Ubuntu Xenial, at least.
2018-02-22 10:09:29 -08:00
Umair Khan
c8f034e9a0 queue: Remove missedmessage_email_senders code.
After 68513952fb, all emails are sent through email_senders queue.
This commit removes code related to the legacy queue.
2018-02-21 16:43:56 -08:00
Aditya Bansal
efe8545303 local-uploads: Start running authentication checks on file requests.
From here on we start to authenticate uploaded file request before
serving this files in production. This involves allowing NGINX to
pass on these file requests to Django for authentication and then
serve these files by making use on internal redirect requests having
x-accel-redirect field. The redirection on requests and loading
of x-accel-redirect param is handled by django-sendfile.

NOTE: This commit starts to authenticate these requests for Zulip
servers running platforms either Ubuntu Xenial (16.04) or above.

Fixes: #320 and #291 partially.
2018-02-16 05:06:37 +05:30
Greg Price
20c734c90a puppet: Fix type error in new Nagios check for analytics state. 2018-02-09 17:46:46 -08:00
Tim Abbott
005b0fb566 puppet: Clean up ssh authorized_keys configuration rules. 2018-02-09 16:37:03 -08:00
Tim Abbott
aca25b6f0a puppet: Move ssh configuration to use notify.
This handles more correctly the case where we're using the upstream
sshd_config file.
2018-02-09 16:37:03 -08:00
Tim Abbott
486de8abfc puppet: Edit some rules to support chat.zulip.org.
This should make it possible to use the zulip_ops base rules
successfully on chat.zulip.org.  Many of the changes in this commit
are hacks and probably can be cleaned up later, but given that we plan
to drop trusty support soon, it's likely that most of them will simply
be deleted then.
2018-02-09 16:37:03 -08:00
Rishi Gupta
1d581a9c6e nagios: Add nagios check for analytics state.
This should help us detect issues where the analytics cron jobs aren't
running properly.

The cron/nagios part of the implementation done by tabbott.
2018-02-09 16:36:05 -08:00
Greg Price
7df29e7a7c puppet: Only use those "modern" options when on xenial.
On trusty, we of course have an older version -- 1.4.14 -- and it is
not so modern, so this just gives an error.
2018-02-08 18:11:52 -08:00
Greg Price
23e6a2e579 puppet: Update memcached config to turn on this decade's technology.
We've been running this change on zulipchat.com for a couple of months
now.  Before then, we used to regularly get exceptions like this:

     File "./zerver/views/messages.py", line 749, in get_messages_backend
       setter=stringify_message_dict)
     File "./zerver/lib/cache.py", line 275, in generic_bulk_cached_fetch
       cache_set_many(items_for_remote_cache)
     File "./zerver/lib/cache.py", line 215, in cache_set_many
       get_cache_backend(cache_name).set_many(items, timeout=timeout)
     File "/home/zulip/deployments/2017-09-28-21-04-12/zulip-py3-venv/lib/python3.5/site-packages/django/core/cache/backends/memcached.py", line 150, in set_many
       self._cache.set_multi(safe_data, self.get_backend_timeout(timeout))
   pylibmc.Error: error 48 from memcached_set_multi

This error means memcached was unable to find space for the new value.
You might think that because memcached provides an LRU cache, this
shouldn't happen because it would just evict something... but in fact
  * memcached splits its data into "slabs" by object size, and
  * until recently, once a 1MiB "chunk" is allocated to a given "slab"
    i.e. size class, it wouldn't be reclaimed to allocate to another.

So once the cache has been filled up with objects of some distribution
of sizes, if some objects come in that would go in a different size
class, we have no chunks for that size class / slab, and can't get one.
And that's exactly what was happening on zulipchat.com.

Useful background can be found in
  https://github.com/memcached/memcached/wiki/ServerMaint#slab-imbalance
  https://github.com/memcached/memcached/wiki/ReleaseNotes1411
  https://github.com/memcached/memcached/wiki/ReleaseNotes1425
  https://github.com/memcached/memcached/wiki/ReleaseNotes150
We're already running v1.4.25, which provides an "automover" that should
be well equipped to fix this; v1.5.0 turns it on by default.

With this commit, adopt the "modern start line" recommended in the
release notes for our v1.4.25, including turning on the automover.
2018-02-08 16:34:49 -08:00
Vishnu Ks
bf2961418b puppet: Remove comment about period of soft deactivate users.
This often becomes wrong over time as it is currently.
2018-01-24 17:15:08 -08:00
Vishnu Ks
a11b742984 messages: Calculate value of first visible message ID using cron job.
[greg: Fixed buggy time conversion in estimate_recent_messages.]
2018-01-24 17:15:08 -08:00
Tim Abbott
9ed2a94b8c nagios: Add configuration designed for full-stack servers.
This doesn't yet pass all Nagios checks correctly, and still has a few
flaws:
* The ideal setup code for the `nagios` user in the database isn't included.
* Some of the other details are a bit off; we need to split some host roles.

But it's better than nothing, and we can iterate from here.
2018-01-24 14:16:03 -08:00
Aditya Bansal
dd0e6c8025 reminders: Fix issue with log file permissions in production. 2018-01-24 03:33:40 +05:30
Tim Abbott
2365b13b68 puppet: Move postgres Nagios plugin to main postgres-common.
This plugins package is required in order to use Nagios checks to
verify the Zulip postgres database, and thus belongs in the default
package set.
2018-01-23 10:31:48 -08:00
Aditya Bansal
ec1297c1e8 schedulemessages: Add delivery system for scheduled message. 2018-01-10 09:18:02 -05:00
Umair Khan
68513952fb email-worker: Create EmailSendingWorker.
This commit just copies all the code from MissedMessageSendingWorker
class to a new EmailSendingWorker class. All the logic to send an email
through a queue was already there. This commit only makes the logic
generic. It does so by creating a special purpose queue called
'email_senders' to send any type of email. To make
MissedMessageSendingWorker still work we derive it from
EmailSendingWorker. All the tests that were testing
MissedMessageSendingWorker now run against EmailSendingWorker.
2017-12-20 19:36:27 -08:00
Tim Abbott
f423dc4930 check_send_receive_time: Fix parsing bug.
This was a regression introduced with the argparse migration.
2017-11-27 14:01:30 -08:00
rht
e55898850a Replace optparse with argparse in remaining tools.
Tweaked by tabbott to fix various bugs with the usage output.
2017-11-21 21:34:38 -08:00
Vishnu Ks
766511e519 actions: Mark all messages as read when user unsubscribes from stream.
This fixes a bug where, when a user is unsubscribed from a stream,
they might have unread messages on that stream leak.  While it might
seem to be a minor problem, it can cause significant problems for
computing the `unread_msgs` data structures, since it means we need to
add an extra filter for whether the user is still subscribed, either
in the backend or in the UI.

Fixes #7095.
2017-11-21 20:09:17 -08:00
Greg Price
ae901309fc certbot: Control auto-renew with a zulip.conf setting.
This causes the cron job to run only when a Zulip-managed certbot
install is actually set up.

Inside `install`, zulip.conf doesn't yet exist when we run
setup-certbot, so we write the setting later.  But we also give
setup-certbot the ability to write the setting itself, so that we
can recommend it in instructions for adopting certbot in an
existing Zulip installation.
2017-11-15 21:50:41 -08:00
Greg Price
dacf65b301 certbot: Move verification webroot under /var/lib/zulip .
If we were making an old-fashioned webroot where hand-written static
HTML files went, somewhere under `/srv` would be most appropriate.
Here, this webroot is really more of an implementation detail of the
certbot set up by the Zulip installer/packaging, containing transient
state.  So someplace under `/var` is appropriate, and specifically
under `/var/lib/zulip` in order to properly namespace it.

For background on `/var/www` and friends, see the top couple of answers
on
  https://unix.stackexchange.com/questions/47436/why-web-server-var-www
2017-11-15 21:50:41 -08:00
Tim Abbott
2afc3b9e50 certbot: Move path to /usr/local/sbin.
[greg: fixed typo bug]
2017-11-15 21:50:41 -08:00
rht
97ec56276c certbot: Add certbot renew cron job to puppet.
Tweaked by tabbott to use the proper command.
2017-11-15 21:50:41 -08:00
Tim Abbott
94554c65da certbot: Modify nginx configuration to support automated renewal. 2017-11-08 12:32:26 -08:00
Tim Abbott
62bb465896 puppet: Modify lb0 nginx configuration. 2017-11-08 12:32:26 -08:00
rht
ccf2792c1c refactor: Remove six.moves.configparser import. 2017-11-07 10:51:44 -08:00
rht
549a26860f refactor: Remove six.moves.range import. 2017-11-07 10:46:42 -08:00
Tim Abbott
acb0b6ee43 process_fts_updates: Fix pgroonga search in development.
For some reason, we have the USING_PGROONGA setting on in development
right now.  I'm going to disable that in another commit to match what
we're doing in production, but we'll still want that setting to work
in development.

The problem here was that process_fts_updates only attempted to read
the USING_PGROONGA setting from a /etc/zulip/zulip.conf source, and
thus would just not be updating the index in development.
2017-10-30 11:44:04 -07:00
Tim Abbott
0d1194811f mypy: Remove ignores for a few typeshed bugs fixed upstream. 2017-10-27 17:09:00 -07:00
Tim Abbott
89b97e7480 python3: Fix REMOTE_USER Apache configuration for Python 3.
We were previously still installing the Python 2 version of mod_wsgi,
which of course doesn't work and can't use the Zulip virtualenv.
2017-10-24 11:48:14 -07:00
Tim Abbott
15f3d5f714 nginx: Fix some buggy gzip compression configuration.
We weren't compressing SVG, while at the same time were incorrectly
compressing octet-stream (Which meant downloading .tar.gz files in
Chrome would get double-compressed).
2017-10-20 11:01:28 -07:00
Tim Abbott
540cae19a8 puppet: Remove obsolete sparkle configuration.
Sparkle was the auto-update system used by the legacy desktop app.  We
haven't been capable of using it for auto-update in years, so there's
no reason to keep around the configuration.

The new Electron app uses a different system anyway.
2017-10-19 16:35:55 -07:00
rht
b57289aacd py3: Remove all `from __future__ import print_function.
Except for these files:
- tools/linter_lib/*
- tools/lib
- tools/lister.py
2017-10-18 12:07:19 -07:00
rht
2f3ae84e5a py3: Remove all __future__ import division. 2017-10-17 23:09:12 -07:00
Tim Abbott
6a5cb0e48c puppet: Make problems with Zephyr mirroring pageable.
Generally this indicates sending messages is completely broken.
2017-10-12 00:16:32 -07:00
rht
de30400fc5 pg_backup_and_purge.py: Remove .py extension. 2017-10-08 15:32:43 -07:00
Tim Abbott
47c5aae5b2 log2zulip: Enforce using python 3 in cron job.
We aren't guaranteed to have the Zulip dependencies installed on
Python 2.
2017-10-06 16:37:17 -07:00
Tim Abbott
0f2e4a55c0 soft deactivation: Shorten management command name.
This command is really for soft deactivation; there's just an undo
feature.
2017-10-06 08:48:43 -07:00
Tim Abbott
f2055397c1 nagios: Update apache configuration to be generated.
Since this is basically just stock Apache configuration for Nagios
with a hostname put in, we can just fetch the hostname from our
configuration.
2017-10-05 21:51:29 -07:00
Tim Abbott
3af01bed85 puppet: Simplify zulip_ops nginx configuration.
Whatever dist/ functionality this had in 2014 is now served by
zulip.org, and since this serves as a sample, it should be as simple
as possible.

Previously, this was more cluttered than it needed to be.
2017-10-05 21:17:57 -07:00
Tim Abbott
e6e7bcf6e1 nagios: Move camo_check_url into configuration. 2017-10-05 21:09:24 -07:00
Tim Abbott
82cee4fde9 check_worker_memory: Increase limits for what leaking means.
The old limits were such that these would sometimes oscillated too
high and page erroneously.  The purpose of this check is to prevent
large memory leaks, and will still achieve that with a higher limit.
2017-10-05 20:54:03 -07:00