docs: Document Tornado sharding configurations.

This commit is contained in:
Alex Vandiver
2025-09-09 15:59:55 +00:00
committed by Tim Abbott
parent fa154b675d
commit 530685a597
3 changed files with 52 additions and 9 deletions

View File

@@ -230,9 +230,10 @@ installing Zulip with a dedicated database server.
backend][s3-uploads].
- **Sharding:** For servers with several thousand daily active users,
Zulip supports sharding its real-time-push Tornado service, both by
realm/organization (for hosting many organizations) and by user ID
(for hosting single very large organizations)
Zulip supports [sharding its real-time-push Tornado
service][tornado-sharding], both by realm/organization (for hosting many
organizations) and by user ID (for hosting single very large
organizations).
Care must be taken when dividing traffic for a single Zulip realm
between multiple Zulip application servers, which is why we
@@ -250,3 +251,4 @@ document](../subsystems/performance.md) may also be of interest.
[s3-uploads]: upload-backends.md#s3-backend-configuration
[streaming-replication]: postgresql.md#postgresql-warm-standby
[contact-support]: https://zulip.com/help/contact-support
[tornado-sharding]: system-configuration.md#tornado_sharding

View File

@@ -361,6 +361,48 @@ Set to a true value to enable object size reporting in memcached. This incurs a
small overhead for every store or delete operation, but allows a
memcached_exporter to report precise item size distribution.
### `[tornado_sharding]`
Keys in this section are used to configure how many Tornado instances
are started, and which users are mapped to which of those instances.
Each Tornado instance listens on a separate port, starting at 9800 and
proceeding upwards from there. A single Tornado instance can usually
handle 1000-1500 concurrent active users, depending on message sending
volume.
Individual organizations may be assigned to ports, either via their
subdomain names, or their fully-qualified hostname (for [organizations
using `REALM_HOSTS`][multiple-organizations.md#other-hostnames]):
```ini
[tornado_sharding]
9800 = realm-a realm-b
9801 = realm-c
9802 = realm-host.example.net
```
Organizations can also be assigned to ports via regex over their
fully-qualified hostname:
```ini
[tornado_sharding]
9800_regex = ^realm-(a|b)\.example\.com$
9801_regex = ^other(-option)?\.example.com$
```
Extremely large organizations can be distributed across multiple
Tornado shards by joining the ports in the key with `_`:
```ini
[tornado_sharding]
9800 = small-realm
9801_9802 = very-large-realm
```
After running `scripts/zulip-puppet-apply`, a separate step to run
`scripts/refresh-sharding-and-restart` is required for any sharding
changes to take effect.
### `[loadbalancer]`
#### `ips`

View File

@@ -137,12 +137,11 @@ idle for a minute. It's likely that with some strategy for detecting
such situations, we could reduce their volume (and thus overall
Tornado load) dramatically.
Currently, Tornado is sharded by realm, which is sufficient for
arbitrary scaling of the number of organizations on a multi-tenant
system like zulip.com. With a somewhat straightforward set of work,
one could change this to sharding by `user_id` instead, which will
eventually be important for individual large organizations with many
thousands of concurrent users.
Currently, Tornado is sharded by realm, and optionally by user-id
within each realm. Sharding by realm is sufficient for arbitrary
scaling of the number of organizations on a multi-tenant system like
zulip.com. Sharding by user-id is necessary for very large
organizations with multiple thousands of active users at once.
### Presence