Files
zulip/docs/production/requirements.md
Alex Vandiver d95fb34ba7 puppet: Admit we leave epmd port 4369 open on all interfaces.
The Erlang `epmd` daemon listens on port 4369, and provides
information (without authentication) about which Erlang processes are
listening on what ports.  This information is not itself a
vulnerability, but may provide information for remote attackers about
what local Erlang services (such as `rabbitmq-server`) are running,
and where.

`epmd` supports an `ERL_EPMD_ADDRESS` environment variable to limit
which interfaces it binds on.  While this environment variable is set
in `/etc/default/rabbitmq-server`, Zulip unfortunately attempts to
start `epmd` using an explicit `exec` block, which ignores those
settings.

Regardless, this lack of `ERL_EPMD_ADDRESS` variable only controls
`epmd`'s startup upon first installation.  Upon reboot, there are two
ways in which `epmd` might be started, neither of which respect
`ERL_EPMD_ADDRESS`:

 - On Focal, an `epmd` service exists and is activated, which uses
   systemd's configuration to choose which interfaces to bind on, and
   thus `ERL_EPMD_ADDRESS` is irrelevant.

 - On Bionic (and Focal, due to a broken dependency from
   `rabbitmq-server` to `epmd@` instead of `epmd`, which may lead to
   the explicit `epmd` service losing a race), `epmd` is started by
   `rabbitmq-server` when it does not detect a running instance.
   Unfortunately, only `/etc/init.d/rabbitmq-server` would respects
   `/etc/default/rabbitmq-server` -- and it defers the actual startup
   to using systemd, which does not pass the environment variable
   down.  Thus, `ERL_EPMD_ADDRESS` is also irrelevant here.

We unfortunately cannot limit `epmd` to only listening on localhost,
due to a number of overlapping bugs and limitations:

 - Manually starting `epmd` with `-address 127.0.0.1` silently fails
   to start on hosts with IPv6 disabled, due to an Erlang bug ([1],
   [2]).

 - The dependencies of the systemd `rabbitmq-server` service can be
   fixed to include the `epmd` service, and systemd can be made to
   bind to `127.0.0.1:4369` and pass that socket to `epmd`, bypassing
   the above bug.  However, the startup of this service is not
   guaranteed, because it races with other sources of `epmd` (see
   below).

 - Any process that runs `rabbitmqctl` results in `epmd` being started
   if one is not currently running; these instances do not respect any
   environment variables as to which addresses to bind on.  This is
   also triggered by `service rabbitmq-server status`, as well as
   various Zulip cron jobs which inspect the rabbitmq queues.  As
   such, it is difficult-to-impossible to ensure that some other
   `epmd` process will not win the race and open the port on all
   interfaces.

Since the only known exposure from leaving port 4369 open is
information that rabbitmq is running on the host, and the complexity
of adjusting this to only bind on localhost is high, we remove the
setting which does not address the problem, and document that the port
is left open, and should be protected via system-level or
network-level firewalls.

[1]: https://bugs.launchpad.net/ubuntu/+source/erlang/+bug/1374109
[2]: https://github.com/erlang/otp/issues/4820
2022-01-24 23:09:02 +00:00

11 KiB

Requirements and scalability

To run a Zulip server, you will need:

  • A dedicated machine or VM
  • A supported OS:
    • Ubuntu 20.04 Focal
    • Ubuntu 18.04 Bionic
    • Debian 11 Bullseye
    • Debian 10 Buster
  • At least 2GB RAM, and 10GB disk space
    • If you expect 100+ users: 4GB RAM, and 2 CPUs
    • If you intend to upgrade from Git: 3GB RAM, or 2G and at least 1G of swap configured.
  • A hostname in DNS
  • Credentials for sending email

For details on each of these requirements, see below.

Server

General

The installer expects Zulip to be the only thing running on the system; it will install system packages with apt (like Nginx, PostgreSQL, and Redis) and configure them for its own use. We strongly recommend using either a fresh machine instance in a cloud provider, a fresh VM, or a dedicated machine. If you decide to disregard our advice and use a server that hosts other services, we can't support you, but we do have some notes on issues you'll encounter.

Operating system

Ubuntu 20.04 Focal, 18.04 Bionic, Debian 11 Bullseye, and Debian 10 Buster are supported for running Zulip in production. 64-bit is recommended. We recommend installing on the newest supported OS release you're comfortable with, to save a bit of future work upgrading the operating system.

If you're using Ubuntu, the Ubuntu universe repository must be enabled, which is usually just:

sudo add-apt-repository universe
sudo apt update

Hardware specifications

  • CPU and memory: For installations with 100+ users you'll need a minimum of 2 CPUs and 4GB RAM. For installations with fewer users, 1 CPU and 2GB RAM is sufficient. We strongly recommend against installing with less than 2GB of RAM, as you will likely experience out of memory issues installing dependencies. We recommend against using highly CPU-limited servers like the AWS t2 style instances for organizations with hundreds of users (active or no).

  • Disk space: You'll need at least 10GB of free disk space for a server with dozens of users. We recommend using an SSD and avoiding cloud storage backends that limit the IOPS per second, since the disk is primarily used for the Zulip database.

See our documentation on scalability below for advice on hardware requirements for larger organizations.

Network and security specifications

  • Incoming HTTPS access (usually port 443, though this is configurable) from the networks where your users are (usually, the public Internet).
  • Incoming port 80 access (optional). Zulip only serves content over HTTPS, and will redirect HTTP requests to HTTPS.
  • Incoming port 25 if you plan to enable Zulip's incoming email integration.
  • Incoming port 4369 should be protected by a firewall to prevent exposing epmd, an Erlang service which does not support binding only to localhost. Leaving this exposed will allow unauthenticated remote users to determine that the server is running RabbitMQ, and on which port, though no further information is leaked.
  • Outgoing HTTP(S) access (ports 80 and 443) to the public Internet so that Zulip can properly manage image and website previews and mobile push notifications. Outgoing Internet access is not required if you disable those features.
  • Outgoing SMTP access (usually port 587) to your SMTP server so that Zulip can send emails.
  • A domain name (e.g. zulip.example.com) that your users will use to access the Zulip server. In order to generate valid SSL certificates with Certbot, and to enable other services such as Google authentication, public DNS name is simpler, but Zulip can be configured to use a non-public domain or even an IP address as its external hostname (though we don't recommend that configuration).
  • Zulip supports running behind a reverse proxy.
  • Zulip configures Smokescreen, and outgoing HTTP proxy, to protect against SSRF attacks, which prevents user from making the Zulip server make requests to private resources. If your network has its own outgoing HTTP proxy, Zulip supports using that instead.

Credentials needed

SSL certificate

Your Zulip server will need an SSL certificate for the domain name it uses. For most Zulip servers, the recommended (and simplest) way to get this is to just use the --certbot option in the Zulip installer, which will automatically get a certificate for you and keep it renewed.

For test installations, an even simpler alternative is always available: the --self-signed-cert option in the installer.

If you'd rather acquire an SSL certificate another way, see our SSL certificate documentation.

Outgoing email

  • Outgoing email (SMTP) credentials that Zulip can use to send outgoing emails to users (e.g. email address confirmation emails during the signup process, message notification emails, password reset, etc.). If you don't have an existing outgoing SMTP solution, read about free outgoing SMTP options and options for prototyping.

Once you have met these requirements, see full instructions for installing Zulip in production.

Scalability

This section details some basic guidelines for running a Zulip server for larger organizations (especially >1000 users or 500+ daily active users). Zulip's resource needs depend mainly on 3 parameters:

  • daily active users (e.g. number of employees if everyone's an employee)
  • total user accounts (can be much larger)
  • message volume.

In the following, we discuss a configuration with at most two types of servers: application servers (running Django, Tornado, RabbitMQ, Redis, Memcached, etc.) and database servers. Of the application server services, Django dominates the resource requirements. One can run every service on its own system (as docker-zulip does) but for most use cases, there's little scalability benefit to doing so. See deployment options for details on installing Zulip with a dedicated database server.

  • Dedicated database. For installations with hundreds of daily active users, we recommend using a remote PostgreSQL database, but it's not required.

  • RAM: We recommended more RAM for larger installations:

    • With 25+ daily active users, 4GB of RAM.
    • With 100+ daily active users, 8GB of RAM.
    • With 400+ daily active users, 16GB of RAM for the Zulip application server, plus 16GB for the database.
    • With 2000+ daily active users 32GB of RAM, plus 32GB for the database.
    • Roughly linear scaling beyond that.
  • CPU: The Zulip application server's CPU usage is heavily optimized due to extensive work on optimizing the performance of requests for latency reasons. Because most servers with sufficient RAM have sufficient CPU resources, CPU requirements are rarely an issue. For larger installations with a dedicated database, we recommend high-CPU instances for the application server and a database-optimized (usually low CPU, high memory) instance for the database.

  • Disk for application server: We recommend using the S3 file uploads backend to store uploaded files at scale. With the S3 backend configuration, we recommend 50GB of disk for the OS, Zulip software, logs and scratch/free space. Disk needs when storing uploads locally

  • Disk for database: SSD disk is highly recommended. For installations where most messages have <100 recipients, 10GB per 1M messages of history is sufficient plus 1GB per 1000 users is sufficient. If most messages are to public streams with 10K+ users subscribed (like on chat.zulip.org), add 20GB per (1000 user accounts) per (1M messages to public streams).

  • Example: When the chat.zulip.org community server had 12K user accounts (~300 daily actives) and 800K messages of history (400K to public streams), it was a default configuration single-server installation with 16GB of RAM, 4 cores (essentially always idle), and its database was using about 100GB of disk.

  • Disaster recovery: One can easily run a hot spare application server and a hot spare database (using PostgreSQL streaming replication). Make sure the hot spare application server has copies of /etc/zulip and you're either syncing LOCAL_UPLOADS_DIR or using the S3 file uploads backend.

  • Sharding: Zulip releases do not fully support dividing Tornado traffic for a single Zulip realm/organization between multiple application servers, which is why we recommend a hot spare over load-balancing. We don't have an easily deployed configuration for load-balancing Tornado within a single organization, and as a result can't currently offer this model outside of enterprise support contracts.

    • Zulip 2.0 and later supports running multiple Tornado servers sharded by realm/organization, which is how we scale Zulip Cloud. Contact us for help implementing the sharding policy.

Scalability is an area of active development, so if you're unsure whether Zulip is a fit for your organization or need further advice contact Zulip support.

For readers interested in technical details around what features impact Zulip's scalability, this performance and scalability design document may also be of interest.