mirror of
https://github.com/zulip/zulip.git
synced 2025-10-31 20:13:46 +00:00
Limiting only by client_name and query leads to a very poorly-indexed
lookup on `query` which throws out nearly all of its rows:
```
Nested Loop (cost=50885.64..60522.96 rows=821 width=8)
-> Index Scan using zerver_client_name_key on zerver_client (cost=0.28..2.49 rows=1 width=4)
Index Cond: ((name)::text = 'zephyr_mirror'::text)
-> Bitmap Heap Scan on zerver_useractivity (cost=50885.37..60429.95 rows=9052 width=12)
Recheck Cond: ((client_id = zerver_client.id) AND ((query)::text = ANY ('{get_events,/api/v1/events}'::text[])))
-> BitmapAnd (cost=50885.37..50885.37 rows=9052 width=0)
-> Bitmap Index Scan on zerver_useractivity_2bfe9d72 (cost=0.00..16631.82 rows=..large.. width=0)
Index Cond: (client_id = zerver_client.id)
-> Bitmap Index Scan on zerver_useractivity_1b1cc7f0 (cost=0.00..34103.95 rows=..large.. width=0)
Index Cond: ((query)::text = ANY ('{get_events,/api/v1/events}'::text[]))
```
A partial index on the client and query list is extremely effective
here in reducing PostgreSQL's workload; however, we cannot easily
write it as a migration, since it depends on the value of the ID of
the `zephyr_mirror` client.
Since this is only relevant for Zulip Cloud, we manually create the
index:
```sql
CREATE INDEX CONCURRENTLY zerver_useractivity_zehpyr_liveness
ON zerver_useractivity(last_visit)
WHERE client_id = 1005
AND query IN ('get_events', '/api/v1/events');
```
We rewrite the query to do the time limit, distinct, and count in SQL,
instead of Python, and make use of this index. This turns a 20-second
query into two 10ms queries.