Commit Graph

108 Commits

Author SHA1 Message Date
Anders Kaseorg
f24a0a6b81 ruff: Fix RUF059 Unpacked variable is never used.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2025-09-30 16:47:54 -07:00
Mateusz Mandera
4e3ec77a0f retention: Eliminate join with Recipient table when archiving DMs.
We can use the is_channel_message column instead of doing the join to
filter on recipient type.
2025-08-19 23:36:10 -07:00
Mateusz Mandera
a352d35660 retention: Add flag to ArchiveTransaction to prevent automatic deletion.
This adds an index non-concurrently, but the table should be small
enough for this to be fine.
2025-02-04 11:31:13 -08:00
Prakhar Pratyush
c5b3d2e434 custom_check: Add rule to avoid creating savepoints.
This commit adds a custom rule to check python files
and raise lint error if they have transaction.atomic used
without any argument or savepoint=True is used.

It helps to avoid creating unintended savepoints in the future.
2024-11-21 14:55:15 -08:00
Prakhar Pratyush
9c9866461a transaction: Add durable=True to the outermost db transactions.
This commit adds `durable=True` to the outermost db transactions
created in the following:
* confirm_email_change
* handle_upload_pre_finish_hook
* deliver_scheduled_emails
* restore_data_from_archive
* do_change_realm_subdomain
* do_create_realm
* do_deactivate_realm
* do_reactivate_realm
* do_delete_user
* do_delete_user_preserving_messages
* create_stripe_customer
* process_initial_upgrade
* do_update_plan
* request_sponsorship
* upload_message_attachment
* register_remote_server
* do_soft_deactivate_users
* maybe_send_batched_emails

It helps to avoid creating unintended savepoints in the future.

This is as a part of our plan to explicitly mark all the
transaction.atomic calls with either 'savepoint=False' or
'durable=True' as required.

* 'savepoint=True' is used in special cases.
2024-11-05 17:58:47 -08:00
Prakhar Pratyush
3d597bb9b0 delete_message_backend: Add durable=True to the outermost transaction.
This commit adds 'durable=True' to the outermost transaction
in 'delete_message_backend'.

It also adds 'savepoint=False' to inner transaction.atomic
decorator to avoid creating savepoint.

This is as a part of our plan to explicitly mark all the
transaction.atomic decorators with either 'savepoint=False' or
'durable=True' as required.

* 'savepoint=True' is used in special cases.
2024-11-01 16:41:15 -07:00
Mateusz Mandera
18fbb5d146 retention: Limit number of ids passed to db in delete messages query.
If do_delete_messages (and friends) are called for a massive number of
messages, the giant list of message ids is passed to Postgres even
though chunk_size makes all but the first chunk_size of message ids
useless.
2024-09-20 09:31:21 -07:00
Mateusz Mandera
ed7c330548 retention: Rename run_archiving_in_chunks to run_archiving. 2024-09-20 09:31:21 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
roanster007
52692a6448 refactor: Rename huddle to direct_message_group in non API.
This commit performs a sweep on the first batch of non API
files to rename "huddle" to "direct_message_group`.

It also renames variables and methods of type -
"huddle_message" to "group_direct_message".

This is a part of #28640
2024-07-04 07:56:31 -07:00
Mateusz Mandera
224ea3aaed retention: Add .restored_timestamp to ArchiveTransaction.
This is just generally useful for tracking and debugging.
2024-05-09 10:54:44 -07:00
John Lu
a5cf0ec526 refactor: Replace HUDDLE with DIRECT_MESSAGE_GROUP.
Replaced HUDDLE attribute with DIRECT_MESSAGE_GROUP using VS Code search,
part of a general renaming of the object class.

Fixes part of #28640.

Co-authored-by: JohnLu2004 <JohnLu10212004@gmail.com>
2024-03-21 16:39:33 -07:00
Alex Vandiver
b94402152d models: Always search Messages with a realm_id or id limit.
Unless there is a limit on `id`, always provide a `realm_id` limit as
well.  We also notate which index is expected to be used in each
query.
2023-09-11 15:00:37 -07:00
Alex Vandiver
0f918d9071 retention: Do not archive attachments with scheduled messages. 2023-08-06 13:40:02 -07:00
Mateusz Mandera
414658fc8e scheduled_message: Handle attachments properly.
Fixes #25414.

We add Attachment.scheduled_messages relation to track ScheduledMessages
which reference the attachment.

The import bits can be done after merging this, by updating #25345.
2023-05-08 09:56:02 -07:00
Prakhar Pratyush
e45623fccc python: Update tuple handling pattern; returned by a delete() query.
This commit updates the pattern for dealing with tuples
returned by the delete() query.

The '(num_deleted, ignored) = ModelName.objects.filter().delete()'
pattern is preferred due to better readability.

We avoid the pattern '(num_deleted, _)' because Django uses _
for translation, which may lead to future bugs.
2023-03-27 16:18:23 -07:00
Anders Kaseorg
e1ed44907b ruff: Fix SIM118 Use key in dict instead of key in dict.keys().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Mateusz Mandera
760587450e retention: Remove two redundant comments.
These two identical comments don't contribute anything useful and seem
just out of place at this point.
2022-10-31 10:23:57 -07:00
Mateusz Mandera
7b13204e8f retention: Use Message.realm to simplify private message query.
We no longer need to do the inner joins to figure out the message's
realm and split up the cross-realm and regular case - now we just look
at zerver_message.realm directly.
2022-10-31 10:23:57 -07:00
Anders Kaseorg
b4b8691239 retention: Inline move_rows query arguments.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-07-30 06:46:34 -07:00
Zixuan James Li
ab1bbdda65 typing: Broaden type annotations for QuerySet compatibility.
To explain the rationale of this change, for example, there is
`get_user_activity_summary` which accepts either a `Collection[UserActivity]`,
where `QuerySet[T]` is not strictly `Sequence[T]` because its slicing behavior
is different from the `Protocol`, making `Collection` necessary.

Similarily, we should have `Iterable[T]` instead of `List[T]` so that
`QuerySet[T]` will also be an acceptable subtype, or `Sequence[T]` when we
also expect it to be indexed.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:42 -07:00
Zixuan James Li
f0e505d557 retention: Guarantee type-safeness when calling move_rows.
This ensures that all the keyword arguments in `move_rows`
have the correct types. Note that `returning_id` is supposed to be a
flag instead of a `Composable` `Literal`.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-06-28 16:01:35 -07:00
Mateusz Mandera
acfa55138e retention: Add docstring info on how archive cleaning works.
In particular, it's important to record the special treatment around
ArchivedAttachment rows not being deleted in this step.
2022-06-08 15:12:36 -07:00
Zixuan James Li
e338ada66c typing: Add none-checks for Recipient objects.
Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-05-31 09:43:55 -07:00
Anders Kaseorg
80def8d2c2 models: Manage indexes from migration 0001 with Django.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-02-23 11:59:45 -08:00
PIG208
254f706465 typing: Fix argument type for models in function signatures. 2021-08-20 05:54:19 -07:00
Anders Kaseorg
09564e95ac mypy: Add types-psycopg2.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-09 20:32:19 -07:00
Mateusz Mandera
8f588dcbab models: Pass realm to get_user_including_cross_realm calls. 2021-07-26 15:33:13 -07:00
Vishnu KS
acffc0ae0a populate_db: Use do_create_realm for creating zephyr realm. 2021-07-06 17:22:00 -07:00
Anders Kaseorg
6e4c3e41dc python: Normalize quotes with Black.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Anders Kaseorg
11741543da python: Reformat with Black, except quotes.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-02-12 13:11:19 -08:00
Mateusz Mandera
a9242d6dfc retention: Eliminate redundant recipient JOIN from cross-realm query.
Since recipient_id (id of the PERSONAL Recipient of the user) was
denormalized into the UserProfile model, this query can be simplified by
getting rid of the zerver_recipient JOIN.
2021-01-18 21:40:37 -08:00
Mateusz Mandera
e3be6db73a retention: Eliminate redundant userprofile JOIN from cross-realm query. 2021-01-18 21:40:37 -08:00
Anders Kaseorg
3885fdadce realm: Fix strict_optional errors.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-07-06 11:25:48 -07:00
Mateusz Mandera
890cafac11 retention: Use batch size of 100 for stream messages.
Streams can have lots of subscribers, meaning that the archiving process
will be moving tons of UserMessages per message. For that reason, using
a smaller batch size for stream messages is justified.

Some personal messages need to be added in test_scrub_realm to have
coverage of do_delete_messages_by_sender after these changes.
2020-06-24 10:41:00 -07:00
Mateusz Mandera
0c6497d43a retention: Add restore_retention_policy_deletions_for_stream function. 2020-06-24 10:40:38 -07:00
Mateusz Mandera
468f8cf488 retention: Improve logging of transactions. 2020-06-24 10:40:38 -07:00
Pragati Agrawal
1562ec758e org settings: Use 'forever' value instead of -1 for message_retention_days.
Currently, we use -1 as the Realm.message_retention_days value to retain
message forever unless specified at stream level for a particular stream,
that is, no policy set at the realm level. But this is incoherent with what
we use for Stream.message_retention_days where -1 means

> disable retention policy for this stream unconditionally

that can be confusing from an API standpoint.

So instead of trying some hack to reset the value to NULL or using some
other value like -2 for RETAIN_MESSAGE_FOREVER and use that for API. It is
much more intuitive to use a string like 'forever' that can be mapped to
RETAIN_MESSAGE_FOREVER at the backend. And this is similar to what we use
for streams settings as well.
2020-06-24 10:38:58 -07:00
Mateusz Mandera
7a03e2a7fe retention: Replace Realm.message_retention_days None value with -1.
To be more consistent with the meaning in the Stream model, and to make
it easier to have a reasonable settings API, we get rid of the None
value for Realm.message_retention_days in favor of the value -1 to
represent the "don't delete messages" default policy.
2020-06-24 10:33:21 -07:00
Tim Abbott
d503549f0b retention: Add some more system documentation. 2020-06-20 17:35:07 -07:00
Anders Kaseorg
365fe0b3d5 python: Sort imports with isort.
Fixes #2665.

Regenerated by tabbott with `lint --fix` after a rebase and change in
parameters.

Note from tabbott: In a few cases, this converts technical debt in the
form of unsorted imports into different technical debt in the form of
our largest files having very long, ugly import sequences at the
start.  I expect this change will increase pressure for us to split
those files, which isn't a bad thing.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-06-11 16:45:32 -07:00
Anders Kaseorg
69730a78cc python: Use trailing commas consistently.
Automatically generated by the following script, based on the output
of lint with flake8-comma:

import re
import sys

last_filename = None
last_row = None
lines = []

for msg in sys.stdin:
    m = re.match(
        r"\x1b\[35mflake8    \|\x1b\[0m \x1b\[1;31m(.+):(\d+):(\d+): (\w+)", msg
    )
    if m:
        filename, row_str, col_str, err = m.groups()
        row, col = int(row_str), int(col_str)

        if filename == last_filename:
            assert last_row != row
        else:
            if last_filename is not None:
                with open(last_filename, "w") as f:
                    f.writelines(lines)

            with open(filename) as f:
                lines = f.readlines()
            last_filename = filename
        last_row = row

        line = lines[row - 1]
        if err in ["C812", "C815"]:
            lines[row - 1] = line[: col - 1] + "," + line[col - 1 :]
        elif err in ["C819"]:
            assert line[col - 2] == ","
            lines[row - 1] = line[: col - 2] + line[col - 1 :].lstrip(" ")

if last_filename is not None:
    with open(last_filename, "w") as f:
        f.writelines(lines)

Signed-off-by: Anders Kaseorg <anders@zulipchat.com>
2020-06-11 16:04:12 -07:00
Mateusz Mandera
b234fe8ccb retention: Pass optional realm argument to move_messages_to_archive.
This allows having the realm field of ArchiveTransaction set instead of
NULL when using move_messages_to_archive.
2020-05-16 14:46:56 -07:00
Mateusz Mandera
7d8a3581a5 retention: Clarify the status of cross-realm huddles in a comment. 2020-05-16 14:42:40 -07:00
Tim Abbott
f10f2600e0 retention: Fix OOM issues when deleting large numbers of transactions.
For unknown reasons, deleting 10,000s of ArchiveTransaction objects
results in rapidly growing memory in the job making the request in the
Django process, eventually leading to an OOM kill.

I don't understand why Django behaves that way; I would have expected
the failure mode to instead be a serious load problem on the database
server, but perhaps the way Django's internal deletion logic handles
cascading the deletes to many millions of ArchiveMessages and other
ForeignKey objects requires tracking a lot of data in memory.

The solution is the same in any case, which is to batch the deletions
to execute a reasonable number of them at once.  Doing a single
ArchiveTransaction at a time would likely result in huge numbers of
database queries in a loop, which performs very poorly.  So we balance
by batching deletions in groups of 100 ArchiveTransactions; testing
this in production, I saw no spike of memory usage materially beyond
that of a normal Django process, and each bulk-deletion transaction
takes several seconds to process (meaning per-transaction overhead is
negligible).
2020-05-15 17:10:19 -07:00
Mateusz Mandera
812ac4714f retention: Optimize fetching of realms and streams with retention policy. 2020-05-07 16:28:05 -07:00
Anders Kaseorg
fd65511fe9 retention: Improve move_rows escaping correctness with psycopg2.sql.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-05-04 09:35:30 -07:00
Tim Abbott
341787a5e0 retention: Use logging API in a more standard way. 2020-05-03 10:57:23 -07:00