diff --git a/docs/production/requirements.md b/docs/production/requirements.md index 335f1ea450..6d9093e863 100644 --- a/docs/production/requirements.md +++ b/docs/production/requirements.md @@ -230,9 +230,10 @@ installing Zulip with a dedicated database server. backend][s3-uploads]. - **Sharding:** For servers with several thousand daily active users, - Zulip supports sharding its real-time-push Tornado service, both by - realm/organization (for hosting many organizations) and by user ID - (for hosting single very large organizations) + Zulip supports [sharding its real-time-push Tornado + service][tornado-sharding], both by realm/organization (for hosting many + organizations) and by user ID (for hosting single very large + organizations). Care must be taken when dividing traffic for a single Zulip realm between multiple Zulip application servers, which is why we @@ -250,3 +251,4 @@ document](../subsystems/performance.md) may also be of interest. [s3-uploads]: upload-backends.md#s3-backend-configuration [streaming-replication]: postgresql.md#postgresql-warm-standby [contact-support]: https://zulip.com/help/contact-support +[tornado-sharding]: system-configuration.md#tornado_sharding diff --git a/docs/production/system-configuration.md b/docs/production/system-configuration.md index bd16e30727..0d0b51da40 100644 --- a/docs/production/system-configuration.md +++ b/docs/production/system-configuration.md @@ -361,6 +361,48 @@ Set to a true value to enable object size reporting in memcached. This incurs a small overhead for every store or delete operation, but allows a memcached_exporter to report precise item size distribution. +### `[tornado_sharding]` + +Keys in this section are used to configure how many Tornado instances +are started, and which users are mapped to which of those instances. +Each Tornado instance listens on a separate port, starting at 9800 and +proceeding upwards from there. A single Tornado instance can usually +handle 1000-1500 concurrent active users, depending on message sending +volume. + +Individual organizations may be assigned to ports, either via their +subdomain names, or their fully-qualified hostname (for [organizations +using `REALM_HOSTS`][multiple-organizations.md#other-hostnames]): + +```ini +[tornado_sharding] +9800 = realm-a realm-b +9801 = realm-c +9802 = realm-host.example.net +``` + +Organizations can also be assigned to ports via regex over their +fully-qualified hostname: + +```ini +[tornado_sharding] +9800_regex = ^realm-(a|b)\.example\.com$ +9801_regex = ^other(-option)?\.example.com$ +``` + +Extremely large organizations can be distributed across multiple +Tornado shards by joining the ports in the key with `_`: + +```ini +[tornado_sharding] +9800 = small-realm +9801_9802 = very-large-realm +``` + +After running `scripts/zulip-puppet-apply`, a separate step to run +`scripts/refresh-sharding-and-restart` is required for any sharding +changes to take effect. + ### `[loadbalancer]` #### `ips` diff --git a/docs/subsystems/performance.md b/docs/subsystems/performance.md index 016e532d41..72e9d333eb 100644 --- a/docs/subsystems/performance.md +++ b/docs/subsystems/performance.md @@ -137,12 +137,11 @@ idle for a minute. It's likely that with some strategy for detecting such situations, we could reduce their volume (and thus overall Tornado load) dramatically. -Currently, Tornado is sharded by realm, which is sufficient for -arbitrary scaling of the number of organizations on a multi-tenant -system like zulip.com. With a somewhat straightforward set of work, -one could change this to sharding by `user_id` instead, which will -eventually be important for individual large organizations with many -thousands of concurrent users. +Currently, Tornado is sharded by realm, and optionally by user-id +within each realm. Sharding by realm is sufficient for arbitrary +scaling of the number of organizations on a multi-tenant system like +zulip.com. Sharding by user-id is necessary for very large +organizations with multiple thousands of active users at once. ### Presence