mirror of
https://github.com/zulip/zulip.git
synced 2025-11-07 23:43:43 +00:00
We have a very useful piece of code, _RateLimitFilter, which is designed to avoid sending us a billion error emails in the event that a Zulip production server is down in a way that throws the same exception a lot. The code uses memcached to ensure we send each traceback roughly once per Zulip server per 10 minutes (or if memcached is unavailable, at most 1/process/10 minutes, since we use memcached to coordinate between processes) However, if memcached is down, there is a logging.error call internal to the Django/memcached setup that happens inside the cache.set() call, and those aren't caught by the `except Exception` block around it. This ends up resulting in infinite recursion, eventually leading to Fatal Python error: Cannot recover from stack overflow., since this handler is configured to run for logging.error in addition to logging.exception. We fix this using a thread-local variable to detect whether we are being called recursively. This change should prevent some nasty failure modes we've had in the past where memcached being down resulted in infinite recursion (resulting in extra resources being consumed by our error notifications code, and most importantly, the error notifications not being sent). Fixes #12595.
9.6 KiB
9.6 KiB