Commit Graph

241 Commits

Author SHA1 Message Date
Alex Vandiver
8804c1afaa users: Remove weird get_all_api_keys helper.
This implied by its name that users could have more than one key.
They cannot, currently; make the code clearer by switching to the
explicit column access.
2025-02-13 12:40:53 -08:00
Alex Vandiver
58bf2a7935 tornado: Limit the width of the user queries, when they're needed.
Tornado requests try hard to not make SQL queries -- and if they're
necessary, to minimize the number of them.  Specifically, both session
objects and user objects are cached in memcached, and we expect that
both of them will have been filled there by Django before any requests
are made to Tornado.

In the event that memcached is flushed, or data is otherwise evicted,
we perform two database queries -- one for the session, and one for
the user.  However, the *width* of the latter query has grown
significantly over time, as the Realm object grew more fields, and
recently with the addition of role groups, which require multiple
joins each.  This leads to a query which is over 12k of text long, and
results in 319 columns.  In the event of a memcached flush, this can
result in a *significant* amount of SQL traffic, as nearly every
active Tornado request will make that query.

We do not wish to narrow the default query for Django; we instead tag
the request in the REST wrapper, and use that to use a much narrower
user cache entry.  That narrower cache entry is filled before the
queue is created in Django; we also use it to explicitly set the log
data, so the second "half" of the continued Tornado request does not
need to fetch any user data either when writing its log line.

Because they use different cache keys, this only affects the
session-based `/json/events` endpoint, which caches by user-id; the
`/api/v1/events` endpoint, which uses an API-key cache, keeps its wide
user object.  The former is 50% of the total request volume, whereas
the latter is only 2%, so adding an additional cache for it is
unnecessary complexity.
2025-02-13 12:40:53 -08:00
Anders Kaseorg
19b8cde27f ruff: Fix PLC0206 Extracting value from dictionary without calling .items().
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-12-21 21:06:53 -08:00
Anders Kaseorg
91ade25ba3 python: Simplify with str.removeprefix, str.removesuffix.
These are available in Python ≥ 3.9.
https://docs.python.org/3/library/stdtypes.html#str.removeprefix

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-09-03 12:30:16 -07:00
Anders Kaseorg
8c0b2d14aa mypy: Remove use of ValuesQuerySet and QuerySetAny.
This was made unnecessary in django-stubs 5.0.1 and mypy 1.10.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-08-24 17:30:41 -07:00
Anders Kaseorg
e3a191b99b ruff: Fix FURB154 Use of repeated consecutive global, nonlocal.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-14 13:53:18 -07:00
Anders Kaseorg
0fa5e7f629 ruff: Fix UP035 Import from collections.abc, typing instead.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
531b34cb4c ruff: Fix UP007 Use X | Y for type annotations.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Anders Kaseorg
e08a24e47f ruff: Fix UP006 Use list instead of List for type annotation.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-07-13 22:28:22 -07:00
Alex Vandiver
1424a2e748 documentation: Move OpenGraph description updating out of middleware.
This middleware was highly-specific to a set of URLs, and pulled in a
beautifulsoup dependency for Tornado.  Move it closer to where it is
used, minimizing action at a distance, as well as trimming out a
dependency.
2024-04-16 10:41:45 -07:00
Mateusz Mandera
4a2a9176c2 realms: Add caching to the get_seat_count calculation for upload limit.
For simiplicty's sake, we can avoid trying to do cache invalidation in
the variety of events that can cause the seat count to change - since
having an up to 1 day delay between users being added and the upload
limit going up is quite reasonable.
2024-04-15 15:08:56 -07:00
Alex Vandiver
352dbf9387 caches: Only take the realm_id, not the Realm, as a cache key function.
This saves a hit to the database to fetch the Realm of a UserProfile
that we are trying to flush.
2024-03-04 16:35:57 -08:00
Anders Kaseorg
865febb307 ruff: Fix RUF021 Parenthesize when chaining and and or together.
This is a preview rule, not yet enabled by default.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-03-01 09:30:04 -08:00
Anders Kaseorg
570f3dd447 python: Reformat with Ruff formatter.
https://docs.astral.sh/ruff/formatter/

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2024-02-29 17:07:16 -08:00
Anders Kaseorg
4aa2d76bea models: Extract zerver.models.streams.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg
cd96193768 models: Extract zerver.models.realms.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg
45bb8d2580 models: Extract zerver.models.users.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg
e601d0ae7c models: Rename zerver/models.py to zerver/models/__init__.py.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-12-16 22:08:44 -08:00
Anders Kaseorg
562a79ab76 ruff: Fix PERF401 Use a list comprehension to create a transformed list.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-07 17:23:55 -07:00
Anders Kaseorg
c4748298bb ruff: Fix PERF102 Using only the keys/values of a dict.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-08-07 17:23:55 -07:00
Steve Howell
61a9f701bd cache: Use a single cache entry for cross-realm bots.
The cross-realm bots rarely change, and there are only
a few of them, so we just query them all at once and
put them in the cache.

Also, we put the dictionaries in the cache, instead of
the user objects, since there is nothing time-sensitive
about the dictionaries, and they are small. This saves
us a little time computing the avatar url and things
like that, not to mention marshalling costs.

This commit also fixes a theoretical bug where we would
have stale cache entries if somebody somehow modified
the cross-realm bots without bumping KEY_PREFIX.

Internally we no longer pre-fetch the realm objects for
the bots, but we don't get overly precise about picking
individual fields from UserProfile, since we rarely hit
the database and since we don't store raw ORM objects
in the cache.

The test diffs make it look like we are hitting the
cache an extra time, but the tests weren't counting
bulk fetches.  Now we only use a single key for all
bots rather a key per bot.
2023-07-25 23:08:52 -07:00
Sahil Batra
bb3945a32f models: Remove select_related call in get_active_users.
We do not use any related fields for the UserProfile objects
fetched by get_active_users, so we can simply remove the
select_related call.

The user object from get_active_users was used to get realm
but since get_active_users called from a realm object we can
directly use that realm object. This change also leads to
some changes in the cache code where we now pass the realm
to the function instead of selecting it from UserProfile object.
2023-07-20 10:44:39 -07:00
Steve Howell
3599b1662e cache: Eliminate transformed_bulk_cached_fetch.
Its two callers now just directly call
generic_bulk_cached_fetch with the explicit `lambda
obj: obj` helpers.
2023-07-19 11:07:33 -07:00
Steve Howell
d19c1f7438 message fetching: Avoid duplicate cache layers.
This code removes a lot of complexity with very likely
positive overall impact on system performance and
negligible downside.

We already cache display recipients on a per-user
level, so there's no need for another cache layer on
top of that that keys them with recipient ids.

We avoid strange things where Alice/Bob and Bob/Charlie
get put into the top layer cache and then we still have
a cache miss on Alice/Charlie despite the lower level
cache being able to support per-user lookups.

This change does introduce an extra database round trip
if any of our messages have a huddle, but the query is
extremely cheap, and we can always try to cache that
function more directly or try to re-use some of our
other huddle-based caches.

As part of this, we clean up the names for the
lower-level per-user cache of display recipients, and
we simplify the cache keys.

We also stop passing in a full Recipient object to the
`bulk_get_huddle_user_ids` functions.

The local impact of this change should be easy to
measure (at least approximately), since we use this
function every time a user gets messages via the
/messages endpoint.
2023-07-19 11:07:33 -07:00
Anders Kaseorg
052984bc14 utils: Remove make_safe_digest wrapper.
It’s unclear what was supposed to be “safe” about this wrapper.  The
hashlib API is fine without it, and we don’t want to encourage further
use of SHA-1.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-07-19 10:54:05 -07:00
Steve Howell
89381a8072 cache: Eliminate get-stream-by-name cache.
We remove the cache functionality for the
get_realm_stream function, and we also change it to
return a thin Stream object (instead of calling
select_related with no arguments).

The main goal here is to remove code complexity, as we
have been prone to at least one caching validation bug
related to how Realm and UserGroup interact. That
particular bug was more theoretical than practical in
terms of its impact, to be clear.

Even if we were to be perfectly disciplined about only
caching thin stream objects and always making sure to
delete cache entries when stream data changed, we would
still be prone to ugly situations like having
transactions get rolled back before we delete the cache
entry. The do_deactivate_stream is a perfect example of
where we have to consider the best time to unset the
cache. If you unset it too early, then you are prone to
races where somebody else churns the cache right before
you update the database. If you set it too late, then
you can have an invalid entry after a rollback or
deadlock situation. If you just eliminate the cache as
a moving part, that whole debate is moot.

As the lack of test changes here indicates, we rarely
fetch streams by name any more in critical sections of
our code.

The one place where we fetch by name is in loading the
home page, but that is **only** when you specify a
stream name. And, of course, that only causes about an
extra millisecond of time.
2023-07-11 13:45:40 -07:00
Ujjawal Modi
a361c23aac alert_words: Refactor the code to flush alert_words cache.
Subsequent commits will add "on_delete=models.RESTRICT"
relationships, which will result in the AlertWord
objects being deleted after Realm has been deleted from
the database.

In order to handle this, we update realm_alert_words_cache_key,
realm_alert_words_automaton_cache_key, and flush_realm_alert_words
functions to accept realm_id as parameter instead of realm
object, so that  the code for flushing the cache works even
after the realm is deleted. This change is fine because
eventually only realm_id is used by these functions and there
is no need of the complete realm object.
2023-06-28 18:03:32 -07:00
Ujjawal Modi
f7346f36fc attachments: Refactor code for flushing used_upload_space cache.
Subsequent commits will add "on_delete=models.RESTRICT"
relationships, which will result in the Attachment
objects being deleted after Realm has been deleted from
the database.

In order to handle this, we update
get_realm_used_upload_space_cache_key function to accept
realm_id as parameter instead of realm object, so that
the code for flushing the cache works even after the
realm is deleted. This change is fine because eventually
only realm_id is used by this function and there is no
need of the complete realm object.
2023-06-28 18:03:32 -07:00
Ujjawal Modi
535a088d0b bots: Refactor code for flushing bots cache.
Subsequent commits will add "on_delete=models.RESTRICT"
relationships, which will result in the UserProfile
objects being deleted after Realm has been deleted from
the database.

In order to handle this, we update bot_dicts_in_realm_cache_key
function to accept realm_id as parameter instead of realm
object, so that  the code for flushing the cache works even
after the realm is deleted. This change is fine because
eventually only realm_id is used by this function and there is
no need of the complete realm object.
2023-06-28 18:03:32 -07:00
Anders Kaseorg
9db3451333 Remove statsd support.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-04-25 19:58:16 -07:00
Zixuan James Li
24f24d236d cache: Use QuerySetAny for isinstance check.
Previously, `QuerySet` does not support isinstance check since it is
defined to be generic in django-stubs. In a recent update, such check is
possible by using `QuerySetAny`, a non-generic alias of `QuerySet`.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2023-03-17 08:38:20 -07:00
Anders Kaseorg
d3efd4c095 python: Import F, Q, QuerySet from their canonical module.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-03-05 14:46:28 -08:00
Sahil Batra
0ed5f76063 settings: Add backend code for using user email_address_visibility setting.
This commits update the code to use user-level email_address_visibility
setting instead of realm-level to set or update the value of UserProfile.email
field and to send the emails to clients.

Major changes are -

- UserProfile.email field is set while creating the user according to
RealmUserDefault.email_address_visbility.

- UserProfile.email field is updated according to change in the setting.

- 'email_address_visibility' is added to person objects in user add event
and in avatar change event.

- client_gravatar can be different for different users when computing
avatar_url for messages and user objects since email available to clients
is dependent on user-level setting.

- For bots, email_address_visibility is set to EVERYONE while creating
them irrespective of realm-default value.

- Test changes are basically setting user-level setting instead of realm
setting and modifying the checks accordingly.
2023-02-10 17:35:49 -08:00
Anders Kaseorg
df001db1a9 black: Reformat with Black 23.
Black 23 enforces some slightly more specific rules about empty line
counts and redundant parenthesis removal, but the result is still
compatible with Black 22.

(This does not actually upgrade our Python environment to Black 23
yet.)

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-02-02 10:40:13 -08:00
Anders Kaseorg
f7e97b1180 ruff: Fix PLW0602 Using global but no assignment is done.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2023-01-04 16:25:07 -08:00
Alex Vandiver
8f6f38c97c cache: Decline to store querysets, with an error.
As we have seen no further cases of this in production since #23215,
increase the severity to an error, and switch from returning a
list (which is not type-safe if the function declares a QuerySet
return) to returning the QuerySet without caching.

Failing to store the result in the cache, with an error, seems
superior to raising an exception; in both cases the next request will
redo the work, but we are guaranteed a worse user experience if we 500
the request.

Ref https://github.com/zulip/zulip/pull/23215#discussion_r994186493
2022-11-29 16:45:11 -08:00
Anders Kaseorg
73c4da7974 ruff: Fix N818 exception name should be named with an Error suffix.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-17 16:52:00 -08:00
Anders Kaseorg
46955da3a0 ruff: Fix ANN204 missing return type annotation for __init__.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-11-16 09:29:11 -08:00
Lauryn Menard
19a4c3907f decorator: Put back check for dev env in ignore_unhashable_lru_cache.
Prior to 53231aa, the `ignore_unhashable_lru_cache` decorator had
a check for the development environment so that changes could be
seen on refresh.

Puts that check back in IgnoreUnhashableLruCacheWrapper class.
2022-11-07 12:17:59 -08:00
Alex Vandiver
c328de3372 cache: Log a warning when attempting to store a whole QuerySet.
As noted in the previous commit, this causes bloat in memcached, for
no purpose.  Log a warning when `cache_with_key` sees a QuerySet
returned from the function it is decorating.
2022-10-12 22:25:48 -07:00
Alex Vandiver
204f1b58e8 cache: Drop realm_id from realm_user_dict_fields.
Storing this key is superfluous, as it will be the same for all users,
and definitionally already known to fetch the cache for the realm.  It
is also not currently used by the callsites that read rows from the
cache.
2022-10-12 22:25:48 -07:00
Zixuan James Li
6c7b2d621e typing: Avoid redefinition of incompatible QuerySets.
The pattern of using the same variable to apply filters
or alter the `QuerySet` in other ways might produce `QuerySet`s
with incompatible types. This behavior is not allowed by mypy.

Signed-off-by: Zixuan James Li <p359101898@gmail.com>
2022-07-07 11:27:43 -07:00
Anders Kaseorg
53231aa9d9 decorator: Type cache_info, cache_clear for ignore_unhashable_lru_cache.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-06-27 10:20:05 -07:00
Anders Kaseorg
d5fea08b8a cache: Remove needless monkey patching.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-06-07 09:37:43 -07:00
Anders Kaseorg
fd16f97d6b python: Excise None from pointlessly nullable booleans.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
2022-04-27 12:40:14 -07:00
Alex Vandiver
351bdfaf78 preview: Use cache only as a non-durable cache, not an IPC.
The `get_link_embed_data` / `link_embed_data_from_cache` pair as
introduced in c93f1d4eda uses the cache
as a temporary store inside of the `embed_links` worker; this means
that it must be durable storage, or the worker will stall and re-fetch
the same links to preview them.

Switch to plumbing through the fetched URL embed data as an parameter
to the Markdown evaluation which uses them, rather than using the
cache as an intermediary.  This frees up the cache to be merely a
non-durable cache.

As a side-effect, this removes get_cache_with_key, and
link_embed_data_from_cache which was its only callsite.
2022-04-15 14:48:12 -07:00
Alex Vandiver
aaa58a49db cache: Make the cache_name=None behaviour clearer.
`django.core.cache.cache` is equal to
`django.core.cache.caches["default"]`; the latter is more
understandable in context.
2022-04-15 14:48:12 -07:00
Zixuan James Li
f21746ba0b cache: Strength types of cache decorators with ParamSpec.
This demonstrates a way to resolve the long-standing issue
of typing higher-order identity functions without using
`cast` and in a type-safe manner for decorators in `cache.py`.

Signed-off-by: Zixuan James Li <359101898@qq.com>
2022-04-14 12:44:35 -07:00
Anders Kaseorg
ad5f0c05b5 python: Remove default "utf8" argument for encode(), decode().
Partially generated by pyupgrade.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2021-08-02 15:53:52 -07:00
Mateusz Mandera
d45f3eecaa models: Add optional realm_id argument to get_system_bot. 2021-07-26 15:31:10 -07:00