Commit Graph

15 Commits

Author SHA1 Message Date
Alex Vandiver
c5200e8b05 deliver_scheduled_emails: Use a queue, instead of infinite retries.
`deliver_scheduled_emails` tries to deliver the email synchronously,
and if it fails, it retries after 10 seconds.  Since it does not track
retries, and always tries the earliest-scheduled-but-due message
first, the worker will not make forward progress if there is a
persistent failure with that message, and will retry indefinitely.
This can result in excessive network or email delivery charges from
the remote SMTP server.

Switch to delivering emails via a new queue worker.  The
`deliver_scheduled_emails` job now serves only to pull deferred jobs
out of the table once they are due, insert them into RabbitMQ, and
then delete them.  This limits the potential for head-of-queue
failures to failures inserting into RabbitMQ, which is more reasonable
than failures speaking to a complex external system we do not control.
Retries and any connections to the SMTP server are left to the
RabbitMQ consumer.

We build a new RabbitMQ queue, rather than use the existing
`email_senders` queue, because that queue is expected to be reasonably
low-latency, for things like missed message notifications.  The
`send_future_email` codepath which inserts into ScheduledEmails is
also (ab)used to digest emails, which are extremely bursty in their
frequency -- and a large burst could significantly delay emails behind
it in the queue.

The new queue is explicitly only for messages which were not initiated
by user actions (e.g., invitation reminders, digests, new account
follow-ups) which are thus not latency-sensitive.

Fixes: #32463.
2025-03-04 16:09:25 -08:00
Prakhar Pratyush
9c9866461a transaction: Add durable=True to the outermost db transactions.
This commit adds `durable=True` to the outermost db transactions
created in the following:
* confirm_email_change
* handle_upload_pre_finish_hook
* deliver_scheduled_emails
* restore_data_from_archive
* do_change_realm_subdomain
* do_create_realm
* do_deactivate_realm
* do_reactivate_realm
* do_delete_user
* do_delete_user_preserving_messages
* create_stripe_customer
* process_initial_upgrade
* do_update_plan
* request_sponsorship
* upload_message_attachment
* register_remote_server
* do_soft_deactivate_users
* maybe_send_batched_emails

It helps to avoid creating unintended savepoints in the future.

This is as a part of our plan to explicitly mark all the
transaction.atomic calls with either 'savepoint=False' or
'durable=True' as required.

* 'savepoint=True' is used in special cases.
2024-11-05 17:58:47 -08:00
Alex Vandiver
88be3246a0 management: Move commands to all use ZulipBaseCommand. 2024-05-24 10:30:16 -07:00
Anders Kaseorg
93198a19ed requirements: Upgrade Python requirements.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-01-29 10:41:54 -08:00
Anders Kaseorg
a50eb2e809 mypy: Enable new error explicit-override.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-10-12 12:28:41 -07:00
Lauryn Menard
dd59e83d54 welcome-emails: Make some code comments and docstrings more evergreen. 2023-07-11 14:15:52 -07:00
Anders Kaseorg
73c4da7974 ruff: Fix N818 exception name should be named with an Error suffix.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-17 16:52:00 -08:00
Alex Vandiver
dfe3727b6d deliver_scheduled_emails: Add a try/except KeyboardInterrupt.
This makes explicit ^C's to exit not spew a backtrace.
2022-02-18 14:04:56 -08:00
Alex Vandiver
c541276ab3 deliver_scheduled_emails: Ue a 10s sleep instead of 2s. 2022-02-18 14:04:56 -08:00
Alex Vandiver
335ef26126 deliver_scheduled_emails: Lock and send emails one-at-a-time.
When pulling batches out of the ScheduledEmail list in a single
transaction, an unexpected failure to send an email will result in the
whole batch getting retried.  This will result in infinite email
sending loops.

Pull a single row off at a time and send it.  We continue without
retries to the next email on EmailNotDeliveredException, but will
retry infinitely on other exceptions.

Fixes: #20943.
2022-02-18 11:13:14 -08:00
Alex Vandiver
4c518c2bba scheduled_email: Consistently lock users table.
Only clear_scheduled_emails previously took a lock on the users before
removing them; make deliver_scheduled_emails do so as well, by using
prefetch_related to ensure that the table appears in the SELECT.  This
is not necessary for correctness, since all accesses of
ScheduledEmailUser first access the ScheduledEmail and lock it; it is
merely for consistency.

Since SELECT ... FOR UPDATE takes an UPDATE lock on all tables
mentioned in the SELECT, merely doing the prefetch is sufficient to
lock both tables; no `on=(...)` is needed to `select_for_update`.

This also does not address the pre-existing potential deadlock from
these two use cases, where both try to lock the same ScheduledEmail
rows in opposite orders.
2021-08-19 00:44:33 -07:00
Alex Vandiver
4ffda1be87 send_email: Fix sleep logic.
This was broken in the refactor in 1e67e0f218.
2021-05-27 22:49:28 -07:00
Alex Vandiver
670c7e7ba4 settings: Remove now-unnecessary EMAIL_DELIVERER_DISABLED setting. 2021-05-18 12:39:28 -07:00
Alex Vandiver
1e67e0f218 deliver_scheduled_*: SELECT FOR UPDATE the relevant rows.
`deliver_scheduled_emails` and `deliver_scheduled_messages` use their
respective tables like a queue, but do not have guarantees that there
was only one consumer (besides the EMAIL_DELIVERER_DISABLED setting),
and could send duplicate messages if multiple consumers raced in
reading rows.

Use database locking to ensure that the database only feeds a given
ScheduledMessage or ScheduledEmail row to a single consumer.  A second
consumer, if it exists, will block until the first consumer commits
the transaction.
2021-05-18 12:39:28 -07:00
Alex Vandiver
0f1611286d management: Rename the deliver_email command to deliver_scheduled_email.
This makes it parallel with deliver_scheduled_messages, and clarifies
that it is not used for simply sending outgoing emails (e.g. the
`email_senders` queue).

This also renames the supervisor job to match.
2021-05-11 13:07:29 -07:00