mirror of
https://github.com/zulip/zulip.git
synced 2025-11-02 04:53:36 +00:00
docs: Add a document explaining email/push notifications.
With various fixes by Mateusz Mandera.
This commit is contained in:
160
docs/subsystems/notifications.md
Normal file
160
docs/subsystems/notifications.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# Notifications in Zulip
|
||||
|
||||
This is a design document aiming to provide context for developers
|
||||
working on Zulip's email notifications and mobile push notifications
|
||||
code paths. We recommend first becoming familiar with [sending
|
||||
messages](../subsystems/sending-messages.md); this document expands on
|
||||
the details of the email/mobile push notifications code path.
|
||||
|
||||
## Important corner cases
|
||||
|
||||
Here we name a few corner cases worth understanding in designing this
|
||||
sort of notifications system:
|
||||
|
||||
* The **Idle Desktop Problem**: We don't want the presence of a
|
||||
desktop computer at the office to eat all notifications because the
|
||||
user has an "online" client that they may not have used in 3 days.
|
||||
* The **Hard Disconnect Problem**: A client can lose its connection to
|
||||
the Internet (or be suspended, or whatever) at any time, and this
|
||||
happens routinely. We want to ensure that races where a user closes
|
||||
their laptop shortly after a notifiable message is sent does not
|
||||
result in the user never receiving a notification about a message
|
||||
(due to the system thinking that client received it).
|
||||
|
||||
## The mobile/email notifications flow
|
||||
|
||||
As a reminder, the relevant part of the flow for sending messages is
|
||||
as follows:
|
||||
* `do_send_messages` is the synchronous message-sending code path,
|
||||
and passing the following data in its `send_event` call:
|
||||
* Data about the message's content (E.g. mentions, wildcard
|
||||
mentions, and alert words) and encodes it into the `UserMessage`
|
||||
table's `flags` structure, which is in turn passed into
|
||||
`send_event` for each user receiving the message.
|
||||
* Data about user configuration relevant to the message, such as
|
||||
`push_notify_user_ids` and `stream_notify_user_ids`, are included
|
||||
alongside `flags` in the per-user data structure.
|
||||
* The `presence_idle_user_ids` set, containing the subset of
|
||||
recipient users who are mentioned, are PM recipients, have alert
|
||||
words, or otherwise would normally get a notification, but have not
|
||||
interacted with a Zulip client in the last few minutes. (Users who
|
||||
have generally will not receive a notification unless the
|
||||
`enable_online_push_notifications` flag is enabled). This data
|
||||
structure ignores users for whom the message is not notifiable,
|
||||
which is important to avoid this being thousands of `user_ids` for
|
||||
messages to large streams with few currently active users.
|
||||
* The Tornado [event queue system](../subsystems/events-system.md)
|
||||
processes that data, as well as data about each user's active event
|
||||
queues, to (1) push an event to each queue needing that message and
|
||||
(2) for notifiable messages, pushing an event onto the
|
||||
`missedmessage_mobile_notifications` and/or `missedmessage_emails`
|
||||
queues. This important message-processing logic has notable extra
|
||||
logic not present when processing normal events, both for details
|
||||
like splicing `flags` to customize event payloads per-user, as well.
|
||||
* The Tornado system determines whether the user is "offline/idle".
|
||||
Zulip's email notifications are designed to not fire when the user
|
||||
is actively using Zulip to avoid spam, and this is where those
|
||||
checks are implemented.
|
||||
* Users in `presence_idle_user_ids` are always considered idle:
|
||||
the variable name means "users who are idle because of
|
||||
presence". This is how we solve the Idle Desktop Problem; users
|
||||
with an idle desktop are treated the same as users who aren't
|
||||
logged in for this check.
|
||||
* However, that check does not handle the Hard Disconnect Problem:
|
||||
if a user was present 1 minute before a message was sent, and then
|
||||
closed their laptop, the user will not be in
|
||||
`presence_idle_user_ids`, and so without an additional mechanism,
|
||||
messages sent shortly after a user leaves would never trigger a
|
||||
notification (!).
|
||||
* We solve that problem by also notifying if
|
||||
`receiver_is_off_zulip` returns `True`, which checks whether the user has any
|
||||
current events system clients registered to receive `message`
|
||||
events. This check is done immediately (handling soft disconnects,
|
||||
where E.g. the user closes their last Zulip tab and we get the
|
||||
`DELETE /events/{queue_id}` request).
|
||||
* The `receiver_is_off_zulip` check is effectively repeated when
|
||||
event queues are garbage-collected (in `missedmessage_hook`) by
|
||||
looking for whether the queue being garbage-collectee was the only
|
||||
one; this second check solves the Hard Disconnect Problem, resulting in
|
||||
notifications for these hard-disconnect cases usually coming 10
|
||||
minutes late.
|
||||
* The message-edit code path has parallel logic in
|
||||
`maybe_enqueue_notifications_for_message_update` for triggering
|
||||
notifications in cases like a mention added during message
|
||||
editing.
|
||||
* The business logic for all these notification decisions made
|
||||
inside Tornado has extensive automated test suites; e.g.
|
||||
`test_message_edit_notifications.py` covers all the cases around
|
||||
editing a message to add/remove a mention.
|
||||
* We may in the future want to add some sort of system for letting
|
||||
users see past notifications, to help with explaining and
|
||||
debugging this system, since it has so much complexity.
|
||||
* Desktop notifications are the simplest; they are implemented
|
||||
client-side by the web/desktop app's logic
|
||||
(`static/js/notifications.js`) inspecting the `flags` fields that
|
||||
were spliced into `message` events by the Tornado system, as well as
|
||||
the user's notification settings.
|
||||
* The queue processors for those queues make the final determination
|
||||
for whether to send a notification, and do the work to generate an
|
||||
email (`zerver/lib/email_notifications.py`) or mobile
|
||||
(`zerver/lib/push_notifications.py`) notification. We'll detail
|
||||
this process in more detail for each system below, but it's
|
||||
important to know that it's normal for a message to sit in these
|
||||
queues for minutes (and in the future, possibly hours).
|
||||
* Both queue processor code paths do additional filtering before
|
||||
sending a notification:
|
||||
* Messages that have already been marked as read by the user before
|
||||
the queue processor runs never trigger a notification.
|
||||
* Messages that were already deleted never trigger a notification.
|
||||
* The user-level settings for whether email/mobile notifications are
|
||||
disabled are rechecked, as the user may have disabled one of these
|
||||
settings during the queuing period.
|
||||
* The **Email notifications queue processor**, `MissedMessageWorker`,
|
||||
takes care to wait for 2 minutes (hopefully in the future this will be a
|
||||
configuration setting) and starts a thread to batch together multiple
|
||||
messages into a single email. These features are unnecessary
|
||||
for mobile push notifications, because we can live-update those
|
||||
details with a future notification, whereas emails cannot be readily
|
||||
updated once sent. Zulip's email notifications are styled similarly
|
||||
to GitHub's email notifications, with a clean, simple design that
|
||||
makes replying from an email client possible (using the [incoming
|
||||
email integration](../production/email-gateway.md)).
|
||||
* The **Push notifications queue processor**,
|
||||
`PushNotificationsWorker`, is a simple wrapper around the
|
||||
`push_notifications.py` code that actually sends the
|
||||
notification. This logic is somewhat complicated by having to track
|
||||
the number of unread push notifications to display on the mobile
|
||||
apps' badges, as well as using the [Mobile Push Notifications
|
||||
Service](../production/mobile-push-notifications.md) for self-hosted
|
||||
systems.
|
||||
|
||||
The following important constraints are worth understanding about the
|
||||
structure of the system, when thinking about changes to it:
|
||||
|
||||
* **Bulk database queries** are much more efficient for checking
|
||||
details from the database like "which users receiving this message
|
||||
are online".
|
||||
* **Thousands of users**. Zulip supports thousands of users, and we
|
||||
want to avoid `send_event()` pushing large amounts of per-user data
|
||||
to Tornado via RabbitMQ for scalability reasons.
|
||||
* **Tornado doesn't do database queries**. Because the Tornado system
|
||||
is an asynchronous event-driven framework, and our Django database
|
||||
library is synchronous, database queries are very expensive. So
|
||||
these queries need to be done in either `do_send_messages` or the
|
||||
queue processor logic. (For example, this means `presence` data
|
||||
should be checked in either `do_send_messages` or the queue
|
||||
processors, not in Tornado).
|
||||
* **Future configuration**. Notification settings are an area that we
|
||||
expect to only expand with time, with upcoming features like
|
||||
following a topic (to get notifications for messages only within
|
||||
that topic in a stream). There are a lot of different workflows
|
||||
possible with Zulip's threading, and it's important to make it easy
|
||||
for users to setup Zulip's notification to fit as many of those
|
||||
workflows as possible.
|
||||
* **Message editing**. Zulip supports editing messages, and that
|
||||
interacts with notifications in ways that require careful handling:
|
||||
Notifications should have
|
||||
the latest edited content (users often fix typos 30 seconds after
|
||||
sending a message), adding a mention when editing a message should
|
||||
send a notification to the newly mentioned user(s), and deleting a
|
||||
message should cancel any unsent notifications for it.
|
||||
Reference in New Issue
Block a user