queue: Fix a race condition in monitoring after queue stops being idle.

The race condition is described in the comment block removed by this
commit. This leaves room for another, remaining race condition
that should be virtually impossible, but nevertheless it seems
worthwhile to have it documented in the code, so we put a new comment
describing it.
As a final note, this is not a new race condition,
it was hypothetically possible with the old code as well.
This commit is contained in:
Mateusz Mandera
2020-09-06 18:26:27 +02:00
committed by Tim Abbott
parent a72e9476ee
commit 2365a53496
3 changed files with 27 additions and 10 deletions

View File

@@ -69,11 +69,16 @@ def analyze_queue_stats(queue_name: str, stats: Dict[str, Any],
# Queue isn't updating the stats file and has some events in
# the backlog, it's likely stuck.
#
# TODO: There's an unfortunate race where if the queue has
# been empty for the last hour (because there haven't been 50
# new events in the last hour), and then gets a burst, this
# condition will be true for the first (event_handling_time *
# 50).
# TODO: There's an unlikely race condition here - if the queue
# was fully emptied and was idle due to no new events coming
# for over 180 seconds, suddenly gets a burst of events and
# this code runs exactly in the very small time window between
# those events popping up and the queue beginning to process
# the first one (which will refresh the stats file at the very
# start), we'll incorrectly return the CRITICAL status. The
# chance of that happening should be negligible because the queue
# worker should wake up immediately and log statistics before
# starting to process the first event.
return dict(status=CRITICAL,
name=queue_name,
message='queue appears to be stuck, last update {}, queue size {}'.format(