mirror of
https://github.com/zulip/zulip.git
synced 2025-11-06 06:53:25 +00:00
docs: Add a document explaining email/push notifications.
With various fixes by Mateusz Mandera.
This commit is contained in:
@@ -161,6 +161,9 @@ Server (which stores queues in memory) were to crash and lose
|
|||||||
its data, clients would recover, just as if they had lost Internet
|
its data, clients would recover, just as if they had lost Internet
|
||||||
access briefly (there is some DoS risk to manage, though).
|
access briefly (there is some DoS risk to manage, though).
|
||||||
|
|
||||||
|
Note that the garbage-collection system has hooks that are important
|
||||||
|
for the implementation of [notifications](../subsystems/notifications.md).
|
||||||
|
|
||||||
(The Event Queue Server is designed to save any event queues to disk
|
(The Event Queue Server is designed to save any event queues to disk
|
||||||
and reload them when the server is restarted, and catches exceptions
|
and reload them when the server is restarted, and catches exceptions
|
||||||
carefully, so such incidents are very rare, but it's nice to have a
|
carefully, so such incidents are very rare, but it's nice to have a
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ Subsystems documentation
|
|||||||
html-css
|
html-css
|
||||||
events-system
|
events-system
|
||||||
sending-messages
|
sending-messages
|
||||||
|
notifications
|
||||||
queuing
|
queuing
|
||||||
custom-apps
|
custom-apps
|
||||||
pointer
|
pointer
|
||||||
|
|||||||
160
docs/subsystems/notifications.md
Normal file
160
docs/subsystems/notifications.md
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
# Notifications in Zulip
|
||||||
|
|
||||||
|
This is a design document aiming to provide context for developers
|
||||||
|
working on Zulip's email notifications and mobile push notifications
|
||||||
|
code paths. We recommend first becoming familiar with [sending
|
||||||
|
messages](../subsystems/sending-messages.md); this document expands on
|
||||||
|
the details of the email/mobile push notifications code path.
|
||||||
|
|
||||||
|
## Important corner cases
|
||||||
|
|
||||||
|
Here we name a few corner cases worth understanding in designing this
|
||||||
|
sort of notifications system:
|
||||||
|
|
||||||
|
* The **Idle Desktop Problem**: We don't want the presence of a
|
||||||
|
desktop computer at the office to eat all notifications because the
|
||||||
|
user has an "online" client that they may not have used in 3 days.
|
||||||
|
* The **Hard Disconnect Problem**: A client can lose its connection to
|
||||||
|
the Internet (or be suspended, or whatever) at any time, and this
|
||||||
|
happens routinely. We want to ensure that races where a user closes
|
||||||
|
their laptop shortly after a notifiable message is sent does not
|
||||||
|
result in the user never receiving a notification about a message
|
||||||
|
(due to the system thinking that client received it).
|
||||||
|
|
||||||
|
## The mobile/email notifications flow
|
||||||
|
|
||||||
|
As a reminder, the relevant part of the flow for sending messages is
|
||||||
|
as follows:
|
||||||
|
* `do_send_messages` is the synchronous message-sending code path,
|
||||||
|
and passing the following data in its `send_event` call:
|
||||||
|
* Data about the message's content (E.g. mentions, wildcard
|
||||||
|
mentions, and alert words) and encodes it into the `UserMessage`
|
||||||
|
table's `flags` structure, which is in turn passed into
|
||||||
|
`send_event` for each user receiving the message.
|
||||||
|
* Data about user configuration relevant to the message, such as
|
||||||
|
`push_notify_user_ids` and `stream_notify_user_ids`, are included
|
||||||
|
alongside `flags` in the per-user data structure.
|
||||||
|
* The `presence_idle_user_ids` set, containing the subset of
|
||||||
|
recipient users who are mentioned, are PM recipients, have alert
|
||||||
|
words, or otherwise would normally get a notification, but have not
|
||||||
|
interacted with a Zulip client in the last few minutes. (Users who
|
||||||
|
have generally will not receive a notification unless the
|
||||||
|
`enable_online_push_notifications` flag is enabled). This data
|
||||||
|
structure ignores users for whom the message is not notifiable,
|
||||||
|
which is important to avoid this being thousands of `user_ids` for
|
||||||
|
messages to large streams with few currently active users.
|
||||||
|
* The Tornado [event queue system](../subsystems/events-system.md)
|
||||||
|
processes that data, as well as data about each user's active event
|
||||||
|
queues, to (1) push an event to each queue needing that message and
|
||||||
|
(2) for notifiable messages, pushing an event onto the
|
||||||
|
`missedmessage_mobile_notifications` and/or `missedmessage_emails`
|
||||||
|
queues. This important message-processing logic has notable extra
|
||||||
|
logic not present when processing normal events, both for details
|
||||||
|
like splicing `flags` to customize event payloads per-user, as well.
|
||||||
|
* The Tornado system determines whether the user is "offline/idle".
|
||||||
|
Zulip's email notifications are designed to not fire when the user
|
||||||
|
is actively using Zulip to avoid spam, and this is where those
|
||||||
|
checks are implemented.
|
||||||
|
* Users in `presence_idle_user_ids` are always considered idle:
|
||||||
|
the variable name means "users who are idle because of
|
||||||
|
presence". This is how we solve the Idle Desktop Problem; users
|
||||||
|
with an idle desktop are treated the same as users who aren't
|
||||||
|
logged in for this check.
|
||||||
|
* However, that check does not handle the Hard Disconnect Problem:
|
||||||
|
if a user was present 1 minute before a message was sent, and then
|
||||||
|
closed their laptop, the user will not be in
|
||||||
|
`presence_idle_user_ids`, and so without an additional mechanism,
|
||||||
|
messages sent shortly after a user leaves would never trigger a
|
||||||
|
notification (!).
|
||||||
|
* We solve that problem by also notifying if
|
||||||
|
`receiver_is_off_zulip` returns `True`, which checks whether the user has any
|
||||||
|
current events system clients registered to receive `message`
|
||||||
|
events. This check is done immediately (handling soft disconnects,
|
||||||
|
where E.g. the user closes their last Zulip tab and we get the
|
||||||
|
`DELETE /events/{queue_id}` request).
|
||||||
|
* The `receiver_is_off_zulip` check is effectively repeated when
|
||||||
|
event queues are garbage-collected (in `missedmessage_hook`) by
|
||||||
|
looking for whether the queue being garbage-collectee was the only
|
||||||
|
one; this second check solves the Hard Disconnect Problem, resulting in
|
||||||
|
notifications for these hard-disconnect cases usually coming 10
|
||||||
|
minutes late.
|
||||||
|
* The message-edit code path has parallel logic in
|
||||||
|
`maybe_enqueue_notifications_for_message_update` for triggering
|
||||||
|
notifications in cases like a mention added during message
|
||||||
|
editing.
|
||||||
|
* The business logic for all these notification decisions made
|
||||||
|
inside Tornado has extensive automated test suites; e.g.
|
||||||
|
`test_message_edit_notifications.py` covers all the cases around
|
||||||
|
editing a message to add/remove a mention.
|
||||||
|
* We may in the future want to add some sort of system for letting
|
||||||
|
users see past notifications, to help with explaining and
|
||||||
|
debugging this system, since it has so much complexity.
|
||||||
|
* Desktop notifications are the simplest; they are implemented
|
||||||
|
client-side by the web/desktop app's logic
|
||||||
|
(`static/js/notifications.js`) inspecting the `flags` fields that
|
||||||
|
were spliced into `message` events by the Tornado system, as well as
|
||||||
|
the user's notification settings.
|
||||||
|
* The queue processors for those queues make the final determination
|
||||||
|
for whether to send a notification, and do the work to generate an
|
||||||
|
email (`zerver/lib/email_notifications.py`) or mobile
|
||||||
|
(`zerver/lib/push_notifications.py`) notification. We'll detail
|
||||||
|
this process in more detail for each system below, but it's
|
||||||
|
important to know that it's normal for a message to sit in these
|
||||||
|
queues for minutes (and in the future, possibly hours).
|
||||||
|
* Both queue processor code paths do additional filtering before
|
||||||
|
sending a notification:
|
||||||
|
* Messages that have already been marked as read by the user before
|
||||||
|
the queue processor runs never trigger a notification.
|
||||||
|
* Messages that were already deleted never trigger a notification.
|
||||||
|
* The user-level settings for whether email/mobile notifications are
|
||||||
|
disabled are rechecked, as the user may have disabled one of these
|
||||||
|
settings during the queuing period.
|
||||||
|
* The **Email notifications queue processor**, `MissedMessageWorker`,
|
||||||
|
takes care to wait for 2 minutes (hopefully in the future this will be a
|
||||||
|
configuration setting) and starts a thread to batch together multiple
|
||||||
|
messages into a single email. These features are unnecessary
|
||||||
|
for mobile push notifications, because we can live-update those
|
||||||
|
details with a future notification, whereas emails cannot be readily
|
||||||
|
updated once sent. Zulip's email notifications are styled similarly
|
||||||
|
to GitHub's email notifications, with a clean, simple design that
|
||||||
|
makes replying from an email client possible (using the [incoming
|
||||||
|
email integration](../production/email-gateway.md)).
|
||||||
|
* The **Push notifications queue processor**,
|
||||||
|
`PushNotificationsWorker`, is a simple wrapper around the
|
||||||
|
`push_notifications.py` code that actually sends the
|
||||||
|
notification. This logic is somewhat complicated by having to track
|
||||||
|
the number of unread push notifications to display on the mobile
|
||||||
|
apps' badges, as well as using the [Mobile Push Notifications
|
||||||
|
Service](../production/mobile-push-notifications.md) for self-hosted
|
||||||
|
systems.
|
||||||
|
|
||||||
|
The following important constraints are worth understanding about the
|
||||||
|
structure of the system, when thinking about changes to it:
|
||||||
|
|
||||||
|
* **Bulk database queries** are much more efficient for checking
|
||||||
|
details from the database like "which users receiving this message
|
||||||
|
are online".
|
||||||
|
* **Thousands of users**. Zulip supports thousands of users, and we
|
||||||
|
want to avoid `send_event()` pushing large amounts of per-user data
|
||||||
|
to Tornado via RabbitMQ for scalability reasons.
|
||||||
|
* **Tornado doesn't do database queries**. Because the Tornado system
|
||||||
|
is an asynchronous event-driven framework, and our Django database
|
||||||
|
library is synchronous, database queries are very expensive. So
|
||||||
|
these queries need to be done in either `do_send_messages` or the
|
||||||
|
queue processor logic. (For example, this means `presence` data
|
||||||
|
should be checked in either `do_send_messages` or the queue
|
||||||
|
processors, not in Tornado).
|
||||||
|
* **Future configuration**. Notification settings are an area that we
|
||||||
|
expect to only expand with time, with upcoming features like
|
||||||
|
following a topic (to get notifications for messages only within
|
||||||
|
that topic in a stream). There are a lot of different workflows
|
||||||
|
possible with Zulip's threading, and it's important to make it easy
|
||||||
|
for users to setup Zulip's notification to fit as many of those
|
||||||
|
workflows as possible.
|
||||||
|
* **Message editing**. Zulip supports editing messages, and that
|
||||||
|
interacts with notifications in ways that require careful handling:
|
||||||
|
Notifications should have
|
||||||
|
the latest edited content (users often fix typos 30 seconds after
|
||||||
|
sending a message), adding a mention when editing a message should
|
||||||
|
send a notification to the newly mentioned user(s), and deleting a
|
||||||
|
message should cancel any unsent notifications for it.
|
||||||
@@ -51,15 +51,16 @@ This section details the ways in which it is different:
|
|||||||
* There is significant custom code inside the `process_message_event`
|
* There is significant custom code inside the `process_message_event`
|
||||||
function in `zerver/tornado/event_queue.py`. This custom code has a
|
function in `zerver/tornado/event_queue.py`. This custom code has a
|
||||||
number of purposes:
|
number of purposes:
|
||||||
* Triggering email and mobile push notifications for any users who
|
* Triggering [email and mobile push
|
||||||
|
notifications](../subsystems/notifications.md) for any users who
|
||||||
do not have active clients and have settings of the form "push
|
do not have active clients and have settings of the form "push
|
||||||
notifications when offline". In order to avoid doing any real
|
notifications when offline". In order to avoid doing any real
|
||||||
computational work inside the Tornado codebase, this logic aims
|
computational work inside the Tornado codebase, this logic aims
|
||||||
to just do the check for whether a notification should be
|
to just do the check for whether a notification should be
|
||||||
generated, and then put an event into an appropriate
|
generated, and then put an event into an appropriate
|
||||||
[queue](../subsystems/queuing.md) to actually send the
|
[queue](../subsystems/queuing.md) to actually send the message.
|
||||||
message. See `maybe_enqueue_notifications` and related code for
|
See `maybe_enqueue_notifications` and related code for this part
|
||||||
this part of the logic.
|
of the logic.
|
||||||
* Splicing user-dependent data (E.g. `flags` such as when the user
|
* Splicing user-dependent data (E.g. `flags` such as when the user
|
||||||
was `mentioned`) into the events.
|
was `mentioned`) into the events.
|
||||||
* Handling the [local echo details](#local-echo).
|
* Handling the [local echo details](#local-echo).
|
||||||
@@ -373,11 +374,12 @@ it’ll arrive in the couple hundred milliseconds one would expect if
|
|||||||
the extra 4500 inactive subscribers didn’t exist.
|
the extra 4500 inactive subscribers didn’t exist.
|
||||||
|
|
||||||
There are a few details that require special care with this system:
|
There are a few details that require special care with this system:
|
||||||
* Email and mobile push notifications. We need to make sure these are
|
* [Email and mobile push
|
||||||
still correctly delivered to soft-deactivated users; making this
|
notifications](../subsystems/notifications.md). We need to make
|
||||||
work required careful work for those code paths that assumed a
|
sure these are still correctly delivered to soft-deactivated users;
|
||||||
`UserMessage` row would always exist for a message that triggers a
|
making this work required careful work for those code paths that
|
||||||
notification to a given user.
|
assumed a `UserMessage` row would always exist for a message that
|
||||||
|
triggers a notification to a given user.
|
||||||
* Digest emails, which use the `UserMessage` table extensively to
|
* Digest emails, which use the `UserMessage` table extensively to
|
||||||
determine what has happened in streams the user can see. We can use
|
determine what has happened in streams the user can see. We can use
|
||||||
the user's subscriptions to construct what messages they should have
|
the user's subscriptions to construct what messages they should have
|
||||||
|
|||||||
@@ -1,3 +1,5 @@
|
|||||||
|
# See https://zulip.readthedocs.io/en/latest/subsystems/notifications.html
|
||||||
|
|
||||||
import re
|
import re
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
from datetime import timedelta
|
from datetime import timedelta
|
||||||
|
|||||||
@@ -1,3 +1,5 @@
|
|||||||
|
# See https://zulip.readthedocs.io/en/latest/subsystems/notifications.html
|
||||||
|
|
||||||
import base64
|
import base64
|
||||||
import logging
|
import logging
|
||||||
import re
|
import re
|
||||||
|
|||||||
@@ -765,7 +765,7 @@ def missedmessage_hook(
|
|||||||
|
|
||||||
def receiver_is_off_zulip(user_profile_id: int) -> bool:
|
def receiver_is_off_zulip(user_profile_id: int) -> bool:
|
||||||
# If a user has no message-receiving event queues, they've got no open zulip
|
# If a user has no message-receiving event queues, they've got no open zulip
|
||||||
# session so we notify them
|
# session so we notify them.
|
||||||
all_client_descriptors = get_client_descriptors_for_user(user_profile_id)
|
all_client_descriptors = get_client_descriptors_for_user(user_profile_id)
|
||||||
message_event_queues = [
|
message_event_queues = [
|
||||||
client for client in all_client_descriptors if client.accepts_messages()
|
client for client in all_client_descriptors if client.accepts_messages()
|
||||||
@@ -789,7 +789,11 @@ def maybe_enqueue_notifications(
|
|||||||
) -> Dict[str, bool]:
|
) -> Dict[str, bool]:
|
||||||
"""This function has a complete unit test suite in
|
"""This function has a complete unit test suite in
|
||||||
`test_enqueue_notifications` that should be expanded as we add
|
`test_enqueue_notifications` that should be expanded as we add
|
||||||
more features here."""
|
more features here.
|
||||||
|
|
||||||
|
See https://zulip.readthedocs.io/en/latest/subsystems/notifications.html
|
||||||
|
for high-level design documentation.
|
||||||
|
"""
|
||||||
notified: Dict[str, bool] = {}
|
notified: Dict[str, bool] = {}
|
||||||
|
|
||||||
if (idle or always_push_notify) and (
|
if (idle or always_push_notify) and (
|
||||||
|
|||||||
Reference in New Issue
Block a user