Age | Commit message (Collapse) | Author |
|
Fixes performance regression
|
|
This small adjustment changes to only increment @jobs_done for a successful pull of a node or when the retries are exceeded and the node is abandoned for that cycle.
Previously @jobs_done was incremented as soon as process() was called.
The problem is that this incremented @jobs_done before knowing if the node completes OK or fails (And requires a retry).
During a retry - the node to be requeued for processing - which would increment @jobs_done multiple times per node (up to retries count per node for a downed node).
This causes @jobs_done to become out of sync with reality. One of the main impacts of this is when the :nodes_done hook gets called. This could cause the hook to fire mid-cycle and then not fire at the 'real' end of the interval which is the intent of :nodes_done. The next time it fires would be when the @jobs_done catches back up (in the NEXT cycle) to the @nodes.count.
|
|
|
|
|
|
|
|
API to worker. This allows for Gitlab (or another Git UI) to properly
display the commit author.
|
|
trigger a event when a full cycle was completed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The current implementation is modular and allows users to define hooks
in several ways:
* Use one of the built-in hook types (currently only 'exec')
* Define their own Hook classes inside ~/.config/oxidized/hook
Exec hook type runs a user defined command with or without shell. It
populates a bunch of environment variables with metadata. The command
can either be run as synchronous or asynchronous. The default is
synchronous.
|
|
|
|
Still not sure we want this. But previous one might have caused infinite
loop in #work.
Consider we have just 1 node all together, and our rotation interval is
more than our MAX_INTER_JOB_GAP, then we'd add @want to 2, instead of 1.
Now we want more threads than we have nodes, and 'while @jobs.size <
@jobs.want' will never be true
|
|
Closes #68 (hopefully at least)
Further, our TODO to refactor/redesign the code to move state from
memory to disk should help.
|
|
MAX_INTER_JOB_GAP is now 300s, if latest job was started 300s ago, we
add new job.
Ratioanele is that if we want n jobs, and all these jobs are taking very
very long, or perhaps hanging, then we are blocking everything else too.
Consider you have use one job, because it's enough to meet your rotation
interval quota. Then some one box is somehow taking tens of minutes or
hours, we won't figure out new amount of workers until it finishes, so
we're blocking all other jobs from spawning.
I'm not super happy about this solution, not really sure what is the
right wayt to tackle it.
|
|
|
|
Looks like this in syslog:
Jul 11 21:05:53 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 22"
Jul 11 21:05:53 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 23"
Jul 11 21:05:54 ytti oxidized[9820]: 10.10.10.10 status no_connection, retry attempt 1
Jul 11 21:05:54 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 22"
Jul 11 21:05:54 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 23"
Jul 11 21:05:55 ytti oxidized[9820]: 10.10.10.10 status no_connection, retry attempt 2
Jul 11 21:05:55 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 22"
Jul 11 21:05:55 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 23"
Jul 11 21:05:56 ytti oxidized[9820]: 10.10.10.10 status no_connection, retry attempt 3
Jul 11 21:05:56 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 22"
Jul 11 21:05:56 ytti oxidized[9820]: 10.10.10.10 raised Errno::ENETUNREACH with msg "Network is unreachable - connect(2) for "10.10.10.10" port 23"
Jul 11 21:05:57 ytti oxidized[9820]: 10.10.10.10 status no_connection, retries exhausted, giving up
|
|
Bumpup gemspec
|
|
|
|
As I can't do IO#select on sinatra/puma to run it when I have time, I
have to run it on separate thread.
This means Nodes container needs to be thread safe, it now has ghetto
mutex locking, but I probably need to be be more focused what are the
external methods that can be called and wrap those in @mutex.synchronize
Provide also HTML UI not just JSON for ghetto UI to people who don't want to
integrate
|
|
'store' is more logical, as we cannot know if output method guarantees
any version history which 'update' implies.
|
|
'syslog.rb' listed to UDP port (or reads file). When IOS or JunOS style
config change/commit message is seen, it triggers immediate update ot
config
It transports commit message (junos) remote host from which change was
mde (ios) and who made the change (junos+ios). This is carried over to
the 'output' methods, that is, 'git blame' will show IOS/JunOS user-name
who made the change.
|
|
Silly for shit-and-giggles attempt at rancid
|