diff options
author | Jason Ackley <jason@ackley.net> | 2018-05-02 08:38:56 -0500 |
---|---|---|
committer | Jason Ackley <jason@ackley.net> | 2018-05-02 08:38:56 -0500 |
commit | ca4aa7c815c5ec9880c0b141cbe8e02f51406b7b (patch) | |
tree | bdf4c6ef4b6e4fd42032ad84bca9e50a17d15958 /lib | |
parent | dba1f023ce6b53e4e353ca0c9ccc88facdad796f (diff) |
Quick-fix: repair some logic for @jobs_done.
This small adjustment changes to only increment @jobs_done for a successful pull of a node or when the retries are exceeded and the node is abandoned for that cycle.
Previously @jobs_done was incremented as soon as process() was called.
The problem is that this incremented @jobs_done before knowing if the node completes OK or fails (And requires a retry).
During a retry - the node to be requeued for processing - which would increment @jobs_done multiple times per node (up to retries count per node for a downed node).
This causes @jobs_done to become out of sync with reality. One of the main impacts of this is when the :nodes_done hook gets called. This could cause the hook to fire mid-cycle and then not fire at the 'real' end of the interval which is the intent of :nodes_done. The next time it fires would be when the @jobs_done catches back up (in the NEXT cycle) to the @nodes.count.
Diffstat (limited to 'lib')
-rw-r--r-- | lib/oxidized/worker.rb | 7 |
1 files changed, 6 insertions, 1 deletions
diff --git a/lib/oxidized/worker.rb b/lib/oxidized/worker.rb index 692b060..5d5fc01 100644 --- a/lib/oxidized/worker.rb +++ b/lib/oxidized/worker.rb @@ -42,9 +42,9 @@ module Oxidized node.stats.add job @jobs.duration job.time node.running = false - @jobs_done += 1 # needed for worker_done event if job.status == :success + @jobs_done += 1 # needed for :nodes_done hook Oxidized.Hooks.handle :node_success, :node => node, :job => job msg = "update #{node.name}" @@ -66,6 +66,11 @@ module Oxidized msg += ", retry attempt #{node.retry}" @nodes.next node.name else + # Only increment the @jobs_done when we give up retries for a node (or success). + # As it would otherwise cause @jobs_done to be incremented with generic retries. + # This would cause :nodes_done hook to desync from running at the end of the nodelist and + # be fired when the @jobs_done > @nodes.count (could be mid-cycle on the next cycle). + @jobs_done += 1 msg += ", retries exhausted, giving up" node.retry = 0 Oxidized.Hooks.handle :node_fail, :node => node, |