summaryrefslogtreecommitdiff
path: root/lib/custodian
AgeCommit message (Collapse)Author
2017-08-08Updated to move ignore-dns-failure code into routine.Steve Kemp
That is then tested when resolve-errors are handled.
2017-08-08Ignore bogus DNS results.Steve Kemp
We've had a problem for the past few weeks (?) where we see false DNS errors when making http/https requests with `curb`/`libcurl`. To resolve these issues properly we're going to have to rewrite the code to avoid the current gem. However that is considerable work because of the hole we've back ourself into - wanting to test both IPv4 and IPv6 "properly". We'll have to duplicate that work if we use `net/http`, or even mroe so if we use `open3` and exec `curl -4|-6 ..` For the moment this commit changes how things are handled to deal with the issue we see - which doesn't solve the problem but will mask it. When custodian runs a test it will return a status-code: * Custodian::TestResult::TEST_FAILED * The test failed, such that an alert should be raised. * Custodian::TestResult::TEST_PASSED * The test succeeded, such that any previous alert should be cleared. * Custodian::TestResult::TEST_SKIPPED * Nothing should be done. As the failure we see is very very specific - an exception is thrown of the type `Curl::Err::HostResolutionError` - we can catch that and return `TEST_SKIPPED`. That means that there will be no (urgent) alert. Obviously the potential risk of swallowing all DNS-failures is that a domain might expire and we'd never know. So we'll do a little better than merely skipping the test if there are DNS failures: * If we see a DNS failure. * Then we try to lookup the host as an A & AAAA record. * If that succeeds we decide the issue was bogus. * If that fails then the host legitimately doesn't resolve so we raise an alert. To recap: * If a host fails normally - bogus status-code, or missing text - we behave as we did in the past. * Only in the case of a DNS-error from curb/curl do we go down this horrid path. * Where we try to confirm the error, and swallow it if false. This closes #13.
2017-07-13Alert in more detail on DNS failures.Steve Kemp
2017-07-11Updated to log the exact DNS error.13-log-dns-errorsSteve Kemp
This is part of #13.
2017-04-10Remove username/password prior to testing URL with curb.Steve Kemp
2017-04-10Use standard URL username/password holders.10-support-http-basic-authSteve Kemp
Rather than: with auth 'username:password' We use: http://user:pass@example.com/
2017-03-28Support HTTP BASIC-AUthentication.Steve Kemp
Supply this like so: http://example.com/ must run http with auth 'username:passw0rd' with status 200 otherwise 'failure'
2017-03-27First stab at allowing custom SSL expiry daysJames Hannah
2017-03-16Use the subject-prefix if it is present.Steve Kemp
2017-03-16Added helper for reading a custom-prefix.Steve Kemp
This will allow classification (by human eyes) of raised-alerts.
2016-12-19Show host/port when TCP timeout occurs.Steve Kemp
This is a failure case which is not 100% clear. This closes #4.
2016-11-03Send the server-name-indicator (SNI) when falling back to legacy.3-send-sni-when-falling-back-to-opensslSteve Kemp
If ruby-based SSL negotiation fails then we fallback to invoking (horridly!) openssl directly. Until now this didn't send the SNI hostname to connect to, so it could only test the first/default SSL site that was listening upon a given IP address. This commit updates things such that we send the correct hostname, from the URL under-test.
2016-07-18Fallback to using `openssl` if we can't get certificates.Steve Kemp
Since the ruby version available to wheezy doesn't support TLS 1.2 fetching the certificate from remote HTTPS servers will fail, if that is all that is available. If we hit that condition, and only that one, we'll fall back to invoking `openssl` natively. This will allow us to monitor expiration-time for remote SSL certificates, but the downside is that we no longr receive the bundle that the remote server might send - so we cannot validate the signature chain. This closes #2.
2016-07-13Update error message for validation-failuSteve Kemp
2016-07-13Retry SSL checks on negotiation failure.release-0.29Steve Kemp
This prevents an endless loop.
2016-04-22Updated to fix the last remaining rubocop warnings.Steve Kemp
This involved silencing a few issues that were judged to be minor, and changing various whitespaces and function-calls. The most obvious example was changing this: assert(ret.kind_of? Array) To this: assert(ret.kind_of?(Array))
2016-04-22More rubocop fixups.Steve Kemp
These are again mostly based around whitespace-changes.
2016-04-22More rubocop fixes.Steve Kemp
2016-04-22Fixed up more rubocop warnings.Steve Kemp
Again these were whitespace-related.
2016-04-22More updates to silence rubocop style-guides.Steve Kemp
These warnings were largely whitespace-based.
2016-04-22Readded file.Steve Kemp
It was required after all.
2016-04-22Removed obsolete file.Steve Kemp
2016-04-22Deleted trailing whitespace.Steve Kemp
Made minor formatting cleanups
2016-04-22Simplified the parsing of the TFTP URI.Steve Kemp
2016-04-21added tftp protocol testJames F. Carter
2016-04-21added a simple tftp utilityJames F. Carter
2016-02-10Don't allow limiting protocl on HTTP/HTTPS tests.root
We cannot allow HTTP/HTTPS to be limited by protocol, such as IPv4-only or IPv6-only. Raise an error in the parser if this is attempted. Added test-case to confirm, and this closes #12488.
2016-02-10Adjusted greediness of regex in http with contentPatrick J Cherry
It should match the next occurrence of the opening quote type, not the last.
2016-02-10Adjusted http with content string parsing.Patrick J Cherry
It now matches "can't match" and 'he said "ha!"'. Added tests.
2016-01-18Updated the queue-handling.Steve Kemp
We now use a zset to store our pending tests. This means that jobs are only in the queue once - no duplicates are allowed. This closes #12428.
2016-01-11Allow expected-test to be double-quoted.Steve Kemp
This changes the parser from only allowing this: http://example.com/ must run http with content 'reserved'. To allowing both of these: http://example.com/ must run http with content "reservered". http://example.com/ must run http with content 'reserved'.
2015-12-18Updated to use the right form of counting for the set.Steve Kemp
2015-12-18Fixed the name of the string.Steve Kemp
2015-12-18Updated to revert to a set with no ordering.Steve Kemp
This is more reliable, albeit potentially racy and with the failure case that a job might be readded twice.
2015-12-18Return values using a reverse-score-range.Steve Kemp
This prevents starvation, by ensuring that we pull tests out in a FIFO fashion - by virtue of the timestamp.
2015-12-18Removed references and support for beanstalkd.Steve Kemp
The beanstalkd queue used to be used in the past, and we later added support for Redis via a simple abstraction layer. But now we've no longer tested and used beanstalkd for over a year, and the client-libraries are no longer available as native Debian packages. With that in mind we've excised the code, although left the abstraction-class in-place.
2015-12-18Removed debugging print.Steve Kemp
2015-12-18Removed the diagnostic output of the test-scoresSteve Kemp
2015-12-18Use a sorted set for tests in our queue.Steve Kemp
This ensures that all tests always run, and we have an ordering.
2015-12-17Treat our Redis queue as a set.Steve Kemp
This means that tests will only ever be enqueued once, regardless of how many times they are parsed. In the past we could have a configuration file that read: test1 .. test2 .. test3 .. Parsing/adding this file would result in a queue looking like so: test1 .. test2 .. test3 .. test1 .. test2 .. test3 .. test1 .. test2 .. test3 .. Now the queue will *ALWAYS* look like this: test1 .. test2 .. test3 .. In the normal course of events this won't matter, as teh processing loop will look like so: * Add new jobs every minute. * Worker runs the jobs. In the case of a failing job though the test might take 2.5 minutes and that will cause the queue to backup. (2.5 minutes because a test is repeated 5 times before a fail is announced, and the timeout is 30 seconds. These values can and should be tweaked.) With the new method even if the queue is slowly draining the queue will never grow to containu hundreds of events it will just be "topped up" not "overflowing". Thanks to James Hannah for the suggestion, and James Lawrie for the patience.
2015-12-02Added swordfish range - 5.28.56.0/21Steve Kemp
2015-11-30Don't do SHA1 signature testing by default.Steve Kemp
2015-11-16Keep 8k history-transitions.Steve Kemp
2015-11-16Updated the redis-alerter to store more useful-state.Steve Kemp
This will make visualization more simple.
2015-11-16Ensure we strip leading/trailing space from alerts.Steve Kemp
This allows our configuration file `/etc/custodian/custodian.cfg` to contain something like this, without errors; alerter = file , redis
2015-10-29Allow testng for weak certificate signing algorithms.Steve Kemp
This is a good thing to do, as Chrome will apaprently be refusing to show sites with SHA-1 in use over SHA-256. This closes #12358.
2015-08-26Catch "RecvErr" exceptions from curb.Steve Kemp
This prevents a slightly ugly backtrace instead of a genuinely useful report.
2015-08-25Explicitly open our configuration file in UTF-8 mode.Steve Kemp
This avoids any errors of the form: invalid byte sequence in US-ASCII
2015-08-07Ensure that we correctly parse bogus macro-definitions.Steve Kemp
We've always had an implicit rule in macro-definitions, that they end with a period. This meant that the first line is valid: FOO is bar.vm.bytemark.co.uk. However we'd expect this to fail: FOO is bar.vm.bytemark.co.uk A similar issue would arise if a macro-definition involved more than one host, only the first would be valid. We've fixed this now, such that the trailing period is optional.
2015-08-04Override the alert-test-type for the SSL-expiry check.Steve Kemp
This allows better alerting.