Commit Graph

1622 Commits

Author SHA1 Message Date
Steve Howell
4dce34ab8b refactor: Simplify call to bulk_get_subscriber_user_ids.
The way we were computing the dictionary was very
convoluted--all we need is a set of subscribed user
ids.
2020-10-18 14:27:31 -07:00
Steve Howell
b58152abda refactor: Introduce all_streams_map.
We replace all_streams_id with a map.

We also use it to populate never_subscribed_streams.

And all_streams_map is a superset of stream_hash,
which we will soon kill off as well.
2020-10-18 14:27:31 -07:00
Steve Howell
78384ebf1b minor: Remove confusing parens.
Apparently I put these parens in the code as
part of 73c30774cb
during 2017.

It looks like I extracted is_public during
the middle of my change and forgot to remove
the unnecessary parens.  (The code was correct,
but it makes it look like a tuple if you're
skimming it too quickly.)
2020-10-18 14:27:31 -07:00
Steve Howell
d60dd94168 refactor: Extract funcs from gather_subscriptions_helper.
This is a pure code move, apart from a little bit
of quote cleanup and renames:

    user_profile -> user
    stream_dict -> result
2020-10-18 14:27:31 -07:00
Steve Howell
79fcf78143 refactor: Exclude "active" from API_FIELDS.
We just need to make sure the relevant queries
get it for the triage process.
2020-10-18 14:27:31 -07:00
Steve Howell
c5769d31f2 minor: Move code for web_public_stream_ids. 2020-10-18 14:27:31 -07:00
Steve Howell
0ca07ffd3c peformance: Eliminate StreamRecipientMap.
That class is an artifact of when Stream
didn't have recipient_id.  Now it's simpler
to deal with stream subscriptions.

We also save a query during page load (and
other places where we get subscriber
info).
2020-10-18 14:27:31 -07:00
Steve Howell
1951d75796 performance: Avoid select_related("realm").
We also move this query up in the function
for some future refactorings.
2020-10-18 14:27:31 -07:00
Steve Howell
3685fcc701 refactor: Remove recipient arg for do_mute_topic. 2020-10-16 12:58:11 -07:00
Steve Howell
378062cc83 performance: Avoid call to access_stream_by_id.
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"

We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.

This generally saves about 3 queries per
user when we unsubscribe them from a stream.
2020-10-16 12:58:11 -07:00
Steve Howell
31eb97ddde performance: Fix do_mark_stream_messages_as_read.
This function no longer asks for data that it
doesn't need.
2020-10-16 12:58:11 -07:00
Steve Howell
6d1f9de7d3 performance: Use SubInfo when removing subscribers.
We get two speedups:

    * The query to get existing subscribers only
      gets the two fields we need.  We no longer
      need all the overhead of user_profile
      and recipient data being returned in the
      query.

    * We avoid Django making extra hops to the
      database to get user info.
2020-10-16 12:58:11 -07:00
Steve Howell
73982f6cc9 refactor: Move SubInfo to stream_subscription.py. 2020-10-16 12:58:11 -07:00
Tim Abbott
caa939d2d5 actions: Use transaction.atomic properly when removing subscriptions.
Previously, the transaction.atomic() was not properly scoped to ensure
that RealmAuditLog entries were created in the same transaction,
making it possible for state changes to not be properly recorded in
RealmAuditLog.
2020-10-15 15:12:05 -07:00
Steve Howell
0b91526f28 events: Remove "occupied" semantics for "streams".
When apps like mobile register for "streams", we
will now just use active streams as our baseline,
rather than "occupied" streams.

This means we will send a stream that is active,
even if it happens to have zero occupants.  It's
actually pretty rare that a stream has zero occupants,
and it's not exactly clear that we want to exclude
a non-occupied but otherwise active stream from
our list of streams.

It also happens to be fairly expensive to compute
whether a stream is occupied.

This change only affects API clients (including
possibly our mobile app).  The main webapp never
used the data from this codepath.
2020-10-15 15:12:01 -07:00
Steve Howell
b4346d0276 performance: Extract subscribers/peers in bulk.
We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.

Note that we have three codepaths that care about
peers:

    subscribing existing users:
        we need to tell peers about new subscribers
        we need to tell subscribed user about old subscribers

    unsubscribing existing users:
        we only need to tell peers who unsubscribed

    subscribing new user:
        we only need to tell peers about the new user
        (right now we generate send_event
        calls to tell the new user about existing
        subscribers, but this is a waste
        of effort that we will fix soon)

The two bulk functions are this:

    bulk_get_subscriber_peer_info
    bulk_get_peers

They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.

Looking up peers/subscribers in bulk leads to some
nice optimizations.

We will save some memchached traffic if you are
subscribing to multiple public streams.

We will save a query in the remove-subscriber
case if you are only dealing with private streams.
2020-10-15 15:12:01 -07:00
Steve Howell
94e41c71f9 refactor: Use set of ids for altered users. 2020-10-15 15:12:01 -07:00
Steve Howell
b894597fa3 refactor: Use sets of stream_ids for helper args. 2020-10-15 15:12:01 -07:00
Steve Howell
3889554977 refactor: Extract send_peer_remove_events. 2020-10-15 15:12:01 -07:00
Tim Abbott
bf66e9c4ab actions: Add transaction.atomic to bulk_add_subs_to_db_with_logging.
This will ensure that we always fully execute the database part of
modifying subscription objects.  In particular, this should prevent
invariant failures like #16347 where Subscription objects were created
without corresponding RealmAuditLog entries.

Fixes #16347.
2020-10-14 11:06:00 -07:00
Steve Howell
5728149e94 performance: Streamline query to add subscribers.
We don't need the select_related('user_profile')
optimization any more, because we just keep
track of user info in our own data structures.

In this codepath we are never actually modifying
users; we just occasionally need their ids or
emails.

This can be a pretty substantive improvement if
you are adding a bunch of users to a stream
who each have a bunch of their own subscriptions.

We could also limit the number of full rows in this
query by adding an extra hop to the DB just to
get colors (using values_list), and then only get
full sub info for the streams that we're adding, rather
than getting every single subscription, in full, for each user.

Apart from finding what colors the user has already
used, the only other reason we need all the columns
in Subscription here is to handle streams that
need to be reactivated.  Otherwise we could do
only("id", "active", "recipient_id", "user_profile_id")
or similar.  Fortunately, Subscription isn't
an overly wide table; it's mostly bool fields.

But by far the biggest thing to avoid is bringing
in all the extra user_profile data.

We have pretty good coverage on query counts here,
so I think this fix is pretty low risk.
2020-10-14 11:03:07 -07:00
Steve Howell
116a441bc5 refactor: Introduce SubInfo class.
This class removes a lot of the annoying tuples
we were passing around.

Also, by including the user everywhere, which
is easily available to us when we make instances
of SubInfo, it sets the stage to remove
select_related('user_profile').
2020-10-14 10:53:10 -07:00
Steve Howell
febef45e38 minor: Add comments to do_get_streams. 2020-10-14 10:53:10 -07:00
Steve Howell
a9356508ca events: Stop sending occupy/vacate events.
We used to send occupy/vacate events when
either the first person entered a stream
or the last person exited.

It appears that our two main apps have never
looked at these events.  Instead, it's
generally the case that clients handle
events related to stream creation/deactivation
and subscribe/unsubscribe.

Note that we removed the apply_events code
related to these events.  This doesn't affect
the webapp, because the webapp doesn't care
about the "streams" field in do_events_register.

There is a theoretical situation where a
third party client could be the victim of
a race where the "streams" data includes
a stream where the last subscriber has left.
I suspect in most of those situations it
will be harmless, or possibly even helpful
to the extent that they'll learn about
streams that are in a "quasi" state where
they're activated but not occupied.

We could try to patch apply_event to
detect when subscriptions get added
or removed. Or we could just make the
"streams" piece of do_events_register
not care about occupy/vacate semantics.
I favor the latter, since it might
actually be what users what, and it will
also simplify the code and improve
performance.
2020-10-14 10:53:10 -07:00
Steve Howell
598601e8fc stream events: Prevent spurious events.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.

This change makes that happen more automatically.
2020-10-13 11:28:17 -07:00
Steve Howell
18771099e4 performance: Introduce new_stream_user_ids.
Let
    U = number of users to subscribe
    S = number of streams to subscribe

We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work.  For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.

In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.

But the code is actually more readable now, so
it's a double win.
2020-10-13 11:28:17 -07:00
Steve Howell
ebb605319b refactor: Rename stream_map to recipient_id_to_stream.
I want to make a new dict called stream_id_to_stream,
and stream_map would be confusing.
2020-10-13 11:28:17 -07:00
Steve Howell
b502957184 refactor: Extract new_recipient_ids local.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.

This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.

I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
2020-10-13 11:28:17 -07:00
Steve Howell
766892d8aa import: Reuse get_last_message_id() helper. 2020-10-13 11:28:17 -07:00
Steve Howell
9df9934ed6 refactor: Pass realm to bulk_add_subscriptions.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.

This may be overkill--I would also be happy if we
just included the assertions from this commit.
2020-10-13 11:28:17 -07:00
Steve Howell
efc931a671 minor: Extract realm local. 2020-10-13 11:28:17 -07:00
Steve Howell
b2d0a2efb9 refactor: Extract send_subscription_add_events.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.

There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
2020-10-13 11:28:17 -07:00
Steve Howell
223ce83a0a refactor: Clean up call to notify_subscriptions_added.
We now just pass in all_subscribers_by_stream, rather
than a callback.

We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.
2020-10-13 11:28:17 -07:00
Steve Howell
811426b345 Extract send_stream_creation_events_for_private_streams.
We can probably avoid passing in users here.
2020-10-12 16:40:37 -07:00
Steve Howell
1cfaef0d1a refactor: Simplify pick_color logic.
This removes the need to jankily mutate
the active flag in the caller, and we don't
need to mutate our subs_by_user either.
2020-10-12 16:40:37 -07:00
Steve Howell
13569ff97a refactor: Eliminate new_subs.
We now just process new subs for a user immediately
within the loop.
2020-10-12 16:40:37 -07:00
Steve Howell
8c70fbde78 refactor: Use subs_to_add in return value.
The subs_to_add is directly related to a var
called new_subs, which I hope to eliminate
soon.
2020-10-12 16:40:37 -07:00
Steve Howell
1afca3d430 minor: Extract local for stream. 2020-10-12 16:40:37 -07:00
Steve Howell
84aa1389d8 Extract bulk_add_subs_to_db_with_logging.
This is a trivial code extraction.
2020-10-12 16:40:37 -07:00
Steve Howell
3ff9ce78ea refactor: Extract send_peer_add_events. 2020-10-12 16:40:37 -07:00
Anders Kaseorg
aabef3d9be python: Catch specific exceptions from orjson.
Followup to #16120.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
2020-10-11 16:11:41 -07:00
sahil839
6c473ed75f message: Call build_message_send_dict from check_message.
We call build_message_send_dict from check_message instead of
do_send_messages.

This is a prep commit for adding a new setting for handling
wildcard mentions in large streams.
2020-09-29 17:18:04 -07:00
sahil839
f1a5fbaeb0 message: Extract build_message_send_dict function.
We extract the loop for building message dict in
do_send_messages in a separate function named
build_message_send_dict.

This is a prep commit for moving the code for building
of message dict in check_message.
2020-09-29 16:50:47 -07:00
sahil839
0514ba7ecb message: Add 'links_for_embed' to message_dict.
There is a bug where we send event for even
those messages which do not have embedded links
as we are using single set 'links_for_embed' to
check whether we have to send event for
embedded links or not.

This commit fixes the bug by adding 'links_for_embed'
in message dict itself and send the event only
if that message has embedded links.
2020-09-29 16:50:47 -07:00
Steve Howell
2c496d9afd mypy: Fix do_send_user_group_update_event. 2020-09-29 16:49:10 -07:00
Vishnu KS
367c792968 actions: Downgrade realm before scrubbing. 2020-09-28 15:37:49 -07:00
Vishnu KS
0d30f59c97 billing: downgrade_now -> downgrade_now_without_creating_additional_invoice. 2020-09-28 15:37:49 -07:00
Tim Abbott
99396b25a6 MessageDict: Add a bit of docstring documentation. 2020-09-28 11:50:02 -07:00
Tim Abbott
90ff62aabc actions: Rename message local variable to message_dict.
This is a preparatory refactor to make it easy to see the changes
using `git show` in the next commit.
2020-09-28 11:14:59 -07:00
sahil839
ae74f8aafb actions: Remove unnecessary comment in do_send_messages function.
This commit removes the unnecessary comment which was added in
9454683108, when we were using message.get() for keys which
were also passed as args in do_send_messages, but there are no
such keys in the current code.
2020-09-28 10:58:35 -07:00