mirror of
https://github.com/zulip/zulip.git
synced 2025-10-23 04:52:12 +00:00
nagios: Change the cron jobs to exit 0 for all ok/warning/critical.
The cron jobs are potentially wrapped by Sentry, which logs "cron failures" and sends emails. We would like those failures to only be when the cron job itself failed to run successfully -- not when the underlying metric is outside of its normal range. We would like to differentiate a failure of the monitoring infrastructure from a failure of what it is monitoring. Swap to return 0 on everything except "unknown" results.
This commit is contained in:
committed by
Tim Abbott
parent
34f0e3b621
commit
a9337e7641
@@ -755,7 +755,15 @@ def atomic_nagios_write(
|
||||
with open(path + ".tmp", "w") as fh:
|
||||
fh.write("|".join([str(event_time), str(status_int), status, message]) + "\n")
|
||||
os.rename(path + ".tmp", path)
|
||||
return status_int
|
||||
|
||||
# Return code should be if the cron job ran to completion
|
||||
# successfully, not if the result of the check was outside of
|
||||
# bounds ("ok" / "critical"); this prevents the Sentry cron
|
||||
# wrapper from spamming with a "failure" email if the nagios check
|
||||
# requires multiple failures in a row.
|
||||
if status == "unknown":
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
Reference in New Issue
Block a user