Age | Commit message (Collapse) | Author |
|
If a subject is specified in a test that will now be used for the
raise/clear message. This allows testing of the new parser-change(s).
|
|
|
|
|
|
|
|
This updates the parser, globally, to allow:
.... with subject 'xxx'
|
|
The intention of this series of changes is to allow subjects
to be replaced for specific tests. The idea of replacement
replaced the idea of a custom-prefix - so I've removed that code
before proceeding.
|
|
New release
See merge request !14
|
|
|
|
This is required for the metrics to be submitted correctly.
|
|
Resolve "Can custodian send a user agent string please"
Closes #19
See merge request !13
|
|
|
|
We'll want to handle timeouts more cleanly now, and use TCP.
|
|
|
|
Resolve "Custodian temporary DNS failure"
Closes #13
See merge request !11
|
|
Also removed a redudant `begin`.
|
|
|
|
When a failure occurs in looking up IPv4 addresses we confirm
that, similarly when/if IPv6 lookups fail we confirm that before
raising the alert.
|
|
That is then tested when resolve-errors are handled.
|
|
|
|
We've had a problem for the past few weeks (?) where we see
false DNS errors when making http/https requests with `curb`/`libcurl`.
To resolve these issues properly we're going to have to rewrite
the code to avoid the current gem. However that is considerable work
because of the hole we've back ourself into - wanting to test both
IPv4 and IPv6 "properly". We'll have to duplicate that work if
we use `net/http`, or even mroe so if we use `open3` and exec
`curl -4|-6 ..`
For the moment this commit changes how things are handled to deal
with the issue we see - which doesn't solve the problem but will
mask it.
When custodian runs a test it will return a status-code:
* Custodian::TestResult::TEST_FAILED
* The test failed, such that an alert should be raised.
* Custodian::TestResult::TEST_PASSED
* The test succeeded, such that any previous alert should be cleared.
* Custodian::TestResult::TEST_SKIPPED
* Nothing should be done.
As the failure we see is very very specific - an exception is thrown
of the type `Curl::Err::HostResolutionError` - we can catch that
and return `TEST_SKIPPED`. That means that there will be no
(urgent) alert.
Obviously the potential risk of swallowing all DNS-failures is that
a domain might expire and we'd never know. So we'll do a little
better than merely skipping the test if there are DNS failures:
* If we see a DNS failure.
* Then we try to lookup the host as an A & AAAA record.
* If that succeeds we decide the issue was bogus.
* If that fails then the host legitimately doesn't resolve so we raise an alert.
To recap:
* If a host fails normally - bogus status-code, or missing text - we behave as we did in the past.
* Only in the case of a DNS-error from curb/curl do we go down this horrid path.
* Where we try to confirm the error, and swallow it if false.
This closes #13.
|
|
Alert in more detail on DNS failures.
See merge request !10
|
|
|
|
Updated to log the exact DNS error.
See merge request !9
|
|
This is part of #13.
|
|
Resolve "The redis view of "known_tests" is often out-of-date"
Closes #12
See merge request !8
|
|
|
|
This will prune old tests from the `redis`-alerter - if that alerter
isn't used this will be harmless.
|
|
|
|
Resolve "We should support HTTP-basic auth for HTTP-based status-checks."
Closes #10
See merge request !7
|
|
Rather than:
with auth 'username:password'
We use:
http://user:pass@example.com/
|
|
|
|
Supply this like so:
http://example.com/ must run http with auth 'username:passw0rd' with status 200 otherwise 'failure'
|
|
Allow tests to specify the number of days before an expiring SSL certificate will generate a warning
See merge request !5
|
|
|
|
Resolve "gitlab-ci should run the test-cases."
Closes #9
See merge request !6
|
|
|
|
They fail.
|
|
|
|
|
|
Resolve "Allow subject-lines to be prefixed with a custom string."
See merge request !4
|
|
|
|
|
|
This will allow classification (by human eyes) of raised-alerts.
|
|
Move to gitlab-CI.
Closes #6
See merge request !3
|
|
|
|
|
|
This closes #6.
|
|
|
|
Show host/port when TCP timeout occurs.
This is a failure case which is not 100% clear.
This closes #4.
See merge request !2
|
|
This is a failure case which is not 100% clear.
This closes #4.
|