Age | Commit message (Collapse) | Author |
|
This updates the parser, globally, to allow:
.... with subject 'xxx'
|
|
The intention of this series of changes is to allow subjects
to be replaced for specific tests. The idea of replacement
replaced the idea of a custom-prefix - so I've removed that code
before proceeding.
|
|
This is required for the metrics to be submitted correctly.
|
|
We'll want to handle timeouts more cleanly now, and use TCP.
|
|
|
|
Also removed a redudant `begin`.
|
|
|
|
When a failure occurs in looking up IPv4 addresses we confirm
that, similarly when/if IPv6 lookups fail we confirm that before
raising the alert.
|
|
That is then tested when resolve-errors are handled.
|
|
We've had a problem for the past few weeks (?) where we see
false DNS errors when making http/https requests with `curb`/`libcurl`.
To resolve these issues properly we're going to have to rewrite
the code to avoid the current gem. However that is considerable work
because of the hole we've back ourself into - wanting to test both
IPv4 and IPv6 "properly". We'll have to duplicate that work if
we use `net/http`, or even mroe so if we use `open3` and exec
`curl -4|-6 ..`
For the moment this commit changes how things are handled to deal
with the issue we see - which doesn't solve the problem but will
mask it.
When custodian runs a test it will return a status-code:
* Custodian::TestResult::TEST_FAILED
* The test failed, such that an alert should be raised.
* Custodian::TestResult::TEST_PASSED
* The test succeeded, such that any previous alert should be cleared.
* Custodian::TestResult::TEST_SKIPPED
* Nothing should be done.
As the failure we see is very very specific - an exception is thrown
of the type `Curl::Err::HostResolutionError` - we can catch that
and return `TEST_SKIPPED`. That means that there will be no
(urgent) alert.
Obviously the potential risk of swallowing all DNS-failures is that
a domain might expire and we'd never know. So we'll do a little
better than merely skipping the test if there are DNS failures:
* If we see a DNS failure.
* Then we try to lookup the host as an A & AAAA record.
* If that succeeds we decide the issue was bogus.
* If that fails then the host legitimately doesn't resolve so we raise an alert.
To recap:
* If a host fails normally - bogus status-code, or missing text - we behave as we did in the past.
* Only in the case of a DNS-error from curb/curl do we go down this horrid path.
* Where we try to confirm the error, and swallow it if false.
This closes #13.
|
|
|
|
This is part of #13.
|
|
|
|
Rather than:
with auth 'username:password'
We use:
http://user:pass@example.com/
|
|
Supply this like so:
http://example.com/ must run http with auth 'username:passw0rd' with status 200 otherwise 'failure'
|
|
|
|
|
|
This will allow classification (by human eyes) of raised-alerts.
|
|
This is a failure case which is not 100% clear.
This closes #4.
|
|
If ruby-based SSL negotiation fails then we fallback to invoking
(horridly!) openssl directly. Until now this didn't send the SNI
hostname to connect to, so it could only test the first/default SSL site
that was listening upon a given IP address.
This commit updates things such that we send the correct hostname,
from the URL under-test.
|
|
Since the ruby version available to wheezy doesn't support TLS 1.2
fetching the certificate from remote HTTPS servers will fail, if
that is all that is available.
If we hit that condition, and only that one, we'll fall back to
invoking `openssl` natively. This will allow us to monitor
expiration-time for remote SSL certificates, but the downside is
that we no longr receive the bundle that the remote server might
send - so we cannot validate the signature chain.
This closes #2.
|
|
|
|
This prevents an endless loop.
|
|
This involved silencing a few issues that were judged to be minor,
and changing various whitespaces and function-calls. The most
obvious example was changing this:
assert(ret.kind_of? Array)
To this:
assert(ret.kind_of?(Array))
|
|
These are again mostly based around whitespace-changes.
|
|
|
|
Again these were whitespace-related.
|
|
These warnings were largely whitespace-based.
|
|
It was required after all.
|
|
|
|
Made minor formatting cleanups
|
|
|
|
|
|
|
|
We cannot allow HTTP/HTTPS to be limited by protocol,
such as IPv4-only or IPv6-only. Raise an error in the
parser if this is attempted.
Added test-case to confirm, and this closes #12488.
|
|
It should match the next occurrence of the opening quote type, not the
last.
|
|
It now matches "can't match" and 'he said "ha!"'.
Added tests.
|
|
We now use a zset to store our pending tests. This means that
jobs are only in the queue once - no duplicates are allowed.
This closes #12428.
|
|
This changes the parser from only allowing this:
http://example.com/ must run http with content 'reserved'.
To allowing both of these:
http://example.com/ must run http with content "reservered".
http://example.com/ must run http with content 'reserved'.
|
|
|
|
|
|
This is more reliable, albeit potentially racy and with the failure
case that a job might be readded twice.
|
|
This prevents starvation, by ensuring that we pull tests out in
a FIFO fashion - by virtue of the timestamp.
|
|
The beanstalkd queue used to be used in the past, and we later
added support for Redis via a simple abstraction layer. But now
we've no longer tested and used beanstalkd for over a year, and
the client-libraries are no longer available as native Debian
packages.
With that in mind we've excised the code, although left the
abstraction-class in-place.
|
|
|
|
|
|
This ensures that all tests always run, and we have an ordering.
|
|
This means that tests will only ever be enqueued once, regardless
of how many times they are parsed.
In the past we could have a configuration file that read:
test1 ..
test2 ..
test3 ..
Parsing/adding this file would result in a queue looking like so:
test1 ..
test2 ..
test3 ..
test1 ..
test2 ..
test3 ..
test1 ..
test2 ..
test3 ..
Now the queue will *ALWAYS* look like this:
test1 ..
test2 ..
test3 ..
In the normal course of events this won't matter, as teh processing
loop will look like so:
* Add new jobs every minute.
* Worker runs the jobs.
In the case of a failing job though the test might take 2.5 minutes
and that will cause the queue to backup. (2.5 minutes because a test
is repeated 5 times before a fail is announced, and the timeout is
30 seconds. These values can and should be tweaked.)
With the new method even if the queue is slowly draining the queue
will never grow to containu hundreds of events it will just be "topped
up" not "overflowing".
Thanks to James Hannah for the suggestion, and James Lawrie for
the patience.
|
|
|
|
|