Files
zulip/zerver/lib/logging_util.py
Mateusz Mandera a7ec1a089c errors: Fix infinite recursion during memcached outages.
We have a very useful piece of code, _RateLimitFilter, which is
designed to avoid sending us a billion error emails in the event that
a Zulip production server is down in a way that throws the same
exception a lot. The code uses memcached to ensure we send each
traceback roughly once per Zulip server per 10 minutes (or if
memcached is unavailable, at most 1/process/10 minutes, since we use
memcached to coordinate between processes)

However, if memcached is down, there is a logging.error call internal
to the Django/memcached setup that happens inside the cache.set() call,
and those aren't caught by the `except Exception` block around it.

This ends up resulting in infinite recursion, eventually leading to
Fatal Python error: Cannot recover from stack overflow., since this
handler is configured to run for logging.error in addition to
logging.exception.

We fix this using a thread-local variable to detect whether we are
being called recursively.

This change should prevent some nasty failure modes we've had in the
past where memcached being down resulted in infinite recursion
(resulting in extra resources being consumed by our error
notifications code, and most importantly, the error notifications not
being sent).

Fixes #12595.
2019-09-03 11:44:39 -07:00

9.6 KiB