nagios: Change the cron jobs to exit 0 for all ok/warning/critical.

The cron jobs are potentially wrapped by Sentry, which logs "cron
failures" and sends emails.  We would like those failures to only be
when the cron job itself failed to run successfully -- not when the
underlying metric is outside of its normal range.  We would like to
differentiate a failure of the monitoring infrastructure from a
failure of what it is monitoring.

Swap to return 0 on everything except "unknown" results.
This commit is contained in:
Alex Vandiver
2025-03-05 17:07:10 +00:00
committed by Tim Abbott
parent 34f0e3b621
commit a9337e7641

View File

@@ -755,7 +755,15 @@ def atomic_nagios_write(
with open(path + ".tmp", "w") as fh:
fh.write("|".join([str(event_time), str(status_int), status, message]) + "\n")
os.rename(path + ".tmp", path)
return status_int
# Return code should be if the cron job ran to completion
# successfully, not if the result of the check was outside of
# bounds ("ok" / "critical"); this prevents the Sentry cron
# wrapper from spamming with a "failure" email if the nagios check
# requires multiple failures in a row.
if status == "unknown":
return 1
return 0
if __name__ == "__main__":