We replace get_peer_user_ids_for_stream_change
with two bulk functions to get peers and/or
subscribers.
Note that we have three codepaths that care about
peers:
subscribing existing users:
we need to tell peers about new subscribers
we need to tell subscribed user about old subscribers
unsubscribing existing users:
we only need to tell peers who unsubscribed
subscribing new user:
we only need to tell peers about the new user
(right now we generate send_event
calls to tell the new user about existing
subscribers, but this is a waste
of effort that we will fix soon)
The two bulk functions are this:
bulk_get_subscriber_peer_info
bulk_get_peers
They have some overlap in the implementation,
but there are some nuanced differences that are
described in the comments.
Looking up peers/subscribers in bulk leads to some
nice optimizations.
We will save some memchached traffic if you are
subscribing to multiple public streams.
We will save a query in the remove-subscriber
case if you are only dealing with private streams.
This will ensure that we always fully execute the database part of
modifying subscription objects. In particular, this should prevent
invariant failures like #16347 where Subscription objects were created
without corresponding RealmAuditLog entries.
Fixes#16347.
We don't need the select_related('user_profile')
optimization any more, because we just keep
track of user info in our own data structures.
In this codepath we are never actually modifying
users; we just occasionally need their ids or
emails.
This can be a pretty substantive improvement if
you are adding a bunch of users to a stream
who each have a bunch of their own subscriptions.
We could also limit the number of full rows in this
query by adding an extra hop to the DB just to
get colors (using values_list), and then only get
full sub info for the streams that we're adding, rather
than getting every single subscription, in full, for each user.
Apart from finding what colors the user has already
used, the only other reason we need all the columns
in Subscription here is to handle streams that
need to be reactivated. Otherwise we could do
only("id", "active", "recipient_id", "user_profile_id")
or similar. Fortunately, Subscription isn't
an overly wide table; it's mostly bool fields.
But by far the biggest thing to avoid is bringing
in all the extra user_profile data.
We have pretty good coverage on query counts here,
so I think this fix is pretty low risk.
This class removes a lot of the annoying tuples
we were passing around.
Also, by including the user everywhere, which
is easily available to us when we make instances
of SubInfo, it sets the stage to remove
select_related('user_profile').
We used to send occupy/vacate events when
either the first person entered a stream
or the last person exited.
It appears that our two main apps have never
looked at these events. Instead, it's
generally the case that clients handle
events related to stream creation/deactivation
and subscribe/unsubscribe.
Note that we removed the apply_events code
related to these events. This doesn't affect
the webapp, because the webapp doesn't care
about the "streams" field in do_events_register.
There is a theoretical situation where a
third party client could be the victim of
a race where the "streams" data includes
a stream where the last subscriber has left.
I suspect in most of those situations it
will be harmless, or possibly even helpful
to the extent that they'll learn about
streams that are in a "quasi" state where
they're activated but not occupied.
We could try to patch apply_event to
detect when subscriptions get added
or removed. Or we could just make the
"streams" piece of do_events_register
not care about occupy/vacate semantics.
I favor the latter, since it might
actually be what users what, and it will
also simplify the code and improve
performance.
If a user asks to be subscribed to a stream
that they are already subscribed to, then
that stream won't be in new_stream_user_ids,
and we won't need to send an event for it.
This change makes that happen more automatically.
Let
U = number of users to subscribe
S = number of streams to subscribe
We were technically doing N^3 amount of work
when we sent certain events, or to be more
precise, U * S * S amount of work. For each
stream, we were looping through a list of tuples
of size U * S to find the users for the stream.
In practice either U or S is usually 1, so the
performance gains here are probably negligible,
especially since the constant factors here
were just slinging around Python data.
But the code is actually more readable now, so
it's a double win.
We rename needs_new_sub (which sounds like
a boolean!) to new_recipient_ids, and we
calculate it explicitly within the loop, so
that we don't need to worry as much about
subsequent passes through the loop mutating it.
This allows us to also remove recipient_ids,
which in turn lets us remove recipients_map,
albeit with a small tweak for stream_map.
I also introduce the my_subs local, which
I use to more directly populate used_colors,
as well as using it as the loop var.
I think it's important that the callers understand
that bulk_add_subscriptions assumes all streams
are being created within a single realm, so I make
it an explicit parameter.
This may be overkill--I would also be happy if we
just included the assertions from this commit.
This function now does all the work that we used
to do with notify_subscriptions_added happening
inside a loop.
There's a small fine-tuning here, where we only
get recent traffic on streams that we're actually
sending events for.
We now just pass in all_subscribers_by_stream, rather
than a callback.
We also move sub_tuples_by_user closer to the
loop where we call notify_subscriptions_added.
We call build_message_send_dict from check_message instead of
do_send_messages.
This is a prep commit for adding a new setting for handling
wildcard mentions in large streams.
We extract the loop for building message dict in
do_send_messages in a separate function named
build_message_send_dict.
This is a prep commit for moving the code for building
of message dict in check_message.
There is a bug where we send event for even
those messages which do not have embedded links
as we are using single set 'links_for_embed' to
check whether we have to send event for
embedded links or not.
This commit fixes the bug by adding 'links_for_embed'
in message dict itself and send the event only
if that message has embedded links.
This commit removes the unnecessary comment which was added in
9454683108, when we were using message.get() for keys which
were also passed as args in do_send_messages, but there are no
such keys in the current code.
This commit removes the unnecessary line of code to get
rendered_content from message dict sent by check_message
when it actually does not inlcude 'rendered_content' key.
This line was added in 9454683108, but now we do not send
rendered_content in the message dict as we render the message
in do_send_messages itself.
Fixes#16284.
Most of the work for this was done when we implemented correct
behavior for guest users, since they treat public streams like private
streams anyway.
The general method involves moving the messages to the new stream with
special care of UserMessage.
We delete UserMessages for subs who are losing access to the message.
For private streams with protected history, we also create UserMessage
elements for users who are not present in the old stream, since that's
important for those users to access the moved messages.
The query to finds and marks all unread UserMessages in the stream as read
can be quite expensive, so we'll move that work to the deferred_work
queue and split it into batches.
Fixes#15770.
`update_message_flags` events used `operation` instead of `op`, the
latter being the standard field used in other events. So add `op`
field to `update_message_flags` and mark `operation` as deprecated,
so that it can be removed later.
This commit adds the is_web_public field in the AbstractAttachment
class. This is useful when validating user access to the attachment,
as otherwise we would have to make a query in the db to check if
that attachment was sent in a message in a web-public stream or not.