During the new user creation code path, there can be no existing
active clients for the user being created, so we can skip the code to
send events to that user's clients.
The tests here reflect that we need to send fewer events, and do fewer
queries that would have been spent computing data for these..
Fixes#16503, combined with the long series of recent changes by Steve
Howell to fix super-linear behavior in this code path.
We no bulk up peer_add/peer_remove events by user if the
same user has subscribed to multiple streams (and just
that single user).
This mostly optimizes the new-user codepath, but the
algorithm is a bit more general in nature.
See commit 8b002040e0 and #86. The
development environment bug that necessitated this handler has long
been irrelevant.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
We now can send an implied matrix of user/stream tuples
for peer_add and peer_remove events.
The client code basically does this:
for stream_id in event['stream_ids']:
for user_id in event['user_ids']:
update_sub(stream_id, user_id)
We used to send individual events, which gets real
expensive when you are creating new streams. For
the case of copy-to-stream case, we should see
events go from U to 1, where U is the number of users
added.
Note that we don't yet fully optimize the potential
of this schema. For adding a new user with lots
of default streams, we still send S peer_add events.
And if you subscribe a bunch of users to a bunch of
private streams, we only go from U * S to S; we can't
optimize it down to one event easily.
All the fields of a stream's recipient object can
be inferred from the Stream, so we just make a local
object. Django will create a Message object without
checking that the child Recipient object has been
saved. If that behavior changes in some upgrade,
we should see some pretty obvious symptom, including
query counts changing.
Tweaked by tabbott to add a longer explanatory comment, and delete a
useless old comment.
This saves us a query for edge cases like when
you try to unsubscribe from a public stream
that you have already unsubscribed from.
But this is mostly to prep for upcoming
optimizations.
This doesn't change anything yet, but the goal is
to eventually optimize events for the case where
one user (typically a new user) gets subscribed
to multiple public streams.
Initially markdown titles were overridden by Youtube and Vimeo preview titles.
But now it will check if any markdown title is present to replace Youtube or
Vimeo preview titles, if preview of linked websites is enabled.
Fixes#16100
Upstream has slightly changed the whitespace around stashes. Take
this opportunity to clean up the extra blank lines we were outputting.
Signed-off-by: Anders Kaseorg <anders@zulip.com>
There was no need to put "stream_id" on the sub
dictionary here. It's kinda annoying to introduce
the little helper here, but I feel
that's better than crufting up the sub data
structure.
The is_web_public flag is already in Stream.API_FIELDS,
so there is no reason for all this complicated logic.
There's no reason to hack it on to the subscription
object.
We replace all_streams_id with a map.
We also use it to populate never_subscribed_streams.
And all_streams_map is a superset of stream_hash,
which we will soon kill off as well.
Apparently I put these parens in the code as
part of 73c30774cb
during 2017.
It looks like I extracted is_public during
the middle of my change and forgot to remove
the unnecessary parens. (The code was correct,
but it makes it look like a tuple if you're
skimming it too quickly.)
That class is an artifact of when Stream
didn't have recipient_id. Now it's simpler
to deal with stream subscriptions.
We also save a query during page load (and
other places where we get subscriber
info).
Let the callers access stream.recipient as needed.
It costs the same, and some of the callers can
actually stop caring about the actual Recipient
object.
We already trust ids that are put on our queue
for deferred work. For example, see the code for
"mark_stream_messages_as_read_for_everyone"
We now pass stream_recipient_id when we queue
up work for do_mark_stream_messages_as_read.
This generally saves about 3 queries per
user when we unsubscribe them from a stream.
We get two speedups:
* The query to get existing subscribers only
gets the two fields we need. We no longer
need all the overhead of user_profile
and recipient data being returned in the
query.
* We avoid Django making extra hops to the
database to get user info.
Previously, the transaction.atomic() was not properly scoped to ensure
that RealmAuditLog entries were created in the same transaction,
making it possible for state changes to not be properly recorded in
RealmAuditLog.
When apps like mobile register for "streams", we
will now just use active streams as our baseline,
rather than "occupied" streams.
This means we will send a stream that is active,
even if it happens to have zero occupants. It's
actually pretty rare that a stream has zero occupants,
and it's not exactly clear that we want to exclude
a non-occupied but otherwise active stream from
our list of streams.
It also happens to be fairly expensive to compute
whether a stream is occupied.
This change only affects API clients (including
possibly our mobile app). The main webapp never
used the data from this codepath.
We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.
Note that we have three codepaths that care about
peers:
subscribing existing users:
we need to tell peers about new subscribers
we need to tell subscribed user about old subscribers
unsubscribing existing users:
we only need to tell peers who unsubscribed
subscribing new user:
we only need to tell peers about the new user
(right now we generate send_event
calls to tell the new user about existing
subscribers, but this is a waste
of effort that we will fix soon)
The two bulk functions are this:
bulk_get_subscriber_peer_info
bulk_get_peers
They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.
Looking up peers/subscribers in bulk leads to some
nice optimizations.
We will save some memchached traffic if you are
subscribing to multiple public streams.
We will save a query in the remove-subscriber
case if you are only dealing with private streams.