From ca4aa7c815c5ec9880c0b141cbe8e02f51406b7b Mon Sep 17 00:00:00 2001 From: Jason Ackley Date: Wed, 2 May 2018 08:38:56 -0500 Subject: Quick-fix: repair some logic for @jobs_done. This small adjustment changes to only increment @jobs_done for a successful pull of a node or when the retries are exceeded and the node is abandoned for that cycle. Previously @jobs_done was incremented as soon as process() was called. The problem is that this incremented @jobs_done before knowing if the node completes OK or fails (And requires a retry). During a retry - the node to be requeued for processing - which would increment @jobs_done multiple times per node (up to retries count per node for a downed node). This causes @jobs_done to become out of sync with reality. One of the main impacts of this is when the :nodes_done hook gets called. This could cause the hook to fire mid-cycle and then not fire at the 'real' end of the interval which is the intent of :nodes_done. The next time it fires would be when the @jobs_done catches back up (in the NEXT cycle) to the @nodes.count. --- lib/oxidized/worker.rb | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'lib/oxidized') diff --git a/lib/oxidized/worker.rb b/lib/oxidized/worker.rb index 692b060..5d5fc01 100644 --- a/lib/oxidized/worker.rb +++ b/lib/oxidized/worker.rb @@ -42,9 +42,9 @@ module Oxidized node.stats.add job @jobs.duration job.time node.running = false - @jobs_done += 1 # needed for worker_done event if job.status == :success + @jobs_done += 1 # needed for :nodes_done hook Oxidized.Hooks.handle :node_success, :node => node, :job => job msg = "update #{node.name}" @@ -66,6 +66,11 @@ module Oxidized msg += ", retry attempt #{node.retry}" @nodes.next node.name else + # Only increment the @jobs_done when we give up retries for a node (or success). + # As it would otherwise cause @jobs_done to be incremented with generic retries. + # This would cause :nodes_done hook to desync from running at the end of the nodelist and + # be fired when the @jobs_done > @nodes.count (could be mid-cycle on the next cycle). + @jobs_done += 1 msg += ", retries exhausted, giving up" node.retry = 0 Oxidized.Hooks.handle :node_fail, :node => node, -- cgit v1.2.1