diff options
author | Steve Kemp <steve@steve.org.uk> | 2012-11-24 10:57:54 +0000 |
---|---|---|
committer | Steve Kemp <steve@steve.org.uk> | 2012-11-24 10:57:54 +0000 |
commit | 5f683d55e5071c893654b8f0c26324b409557fd2 (patch) | |
tree | 83af4bc1167136cc7a74589167eab62468975f65 /README | |
parent | 86ebbec1dbcdca88b3ccf69168be9dd95b38dcc9 (diff) |
Work-in-progress update of the README
Diffstat (limited to 'README')
-rw-r--r-- | README | 113 |
1 files changed, 21 insertions, 92 deletions
@@ -1,37 +1,36 @@ +Source: + https://projects.bytemark.co.uk/projects/custodian -About ------ +Copyright: + Copyright (c) 2012 Bytemark Computer Consulting Ltd - We have a existing monitoring solution which suffers several problems: +Licence: + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. - * It is hard to scale, because all tests are executed upon one machine. - * It is over-engineered, hard to modify, and suffers from threading-related - stability issues. - * It is heavy-weight. Each time an alert is raised/cleared this is done by - executing a "mauvesend" command. +About Custodian +--------------- -Proposal --------- +Custodian is a simple, scalable, and reliable protocol-tester that allows +a number of services to be tested across a network. - Steve proposes we throw this away and replace with something that is - both simpler in implementation, and easier to modify. We'll keep in mind the - aim of allowing multiple monitoring stations - although we note that we will - need to update firewalls to allow probes from more hosts than our single current - one. - - The core design is based upon a work queue. There are two parts to the system: +The core design is based upon a work queue, which has manipulated by +two main scripts: + custodian-enqueue * A parser that reads a list of hosts and tests to apply. These - tests are broken down into individual jobs, serialized to JSON, - and stored in a queue. - - * An arbitrary number of monitoring hosts, which pull jobs from the - work queue and execute them. + tests are broken down into individual jobs, serialized and stored + in a central queue. + custodian-dequeue + * A tool that pulls jobs from the queue, executing them in turn, and + raises/clears alerts based upon the result of the test. @@ -39,25 +38,6 @@ Proposal Implementation -------------- - Because we have an existing tool deployed, sentinel, which has a - reasonably well-defined configuration file I propose that the new - solution will be 100% compatible with it. - - This means we must accept lines of the following form: - --- - -LINN_HOSTS is 89.16.185.172 and 46.43.50.217 and 89.16.185.171 and 89.16.185.173 and 89.16.185.174 and 46.43.50.216 and 46.43.50.212 and 46.43.50.217 and 89.16.185.171. - -LINN_HOSTS must run ssh on 22 otherwise '*Managed client*: "[Goto Redmine]":https://managed.bytemark.co.uk/projects/linn ssh failure'. - -http://acerecords.co.uk/ must run http with status 200 otherwise '*Managed client*: "[Goto Redmine]":https://managed.bytemark.co.uk/projects/acerecords/wiki/Wiki HTTP failure'. - -http://acerecords.co.uk/ must run http with content 'Ace Records' otherwise '*Managed client*: "[Goto Redmine]":https://managed.bytemark.co.uk/projects/acerecords/wiki/Wiki HTTP failure'. --- - - - In brief we accept four distinct kinds of line: @@ -122,54 +102,3 @@ http://acerecords.co.uk/ must run http with content 'Ace Records' otherwise '*Ma - -Behaviour ---------- - -There are two parts to our system: - - a. Parser: ./bin/custodian-enqueue - b. Worker: ./bin/custodian-dequeue - -The parser will read the named configuration file, parse it, and submit the JSON-encoded tests -to the queue. - -The worker will pull down these tests, and execute them. - -Sample JSON looks like this: - - {"target_host":"46.43.37.199","test_type":"ssh","test_port":"22","test_alert":"*Managed client*: \"[Goto Redmine]\":https://managed.bytemark.co.uk/projects/wellinformed/wiki/Wiki ssh failure"} - - -You'll see that the JSON-encoded data is merely a hash, with the following keys: - - target_host: The host that will be probed. - test_port: The port number that will be queried. i.e "22", or "222" for SSH probes. - test_type: The type of test we're runnign "ssh", "http", "ftp", "imap", etc. - test_alert: The text of the alert we'll raise, on failure. - -There are some test-specific extra fields which we might also expect to see: - -dns ---- - resolve_name: A name to lookup, via DNS. - resolve_type: The type of record to lookup [A|AAAA|MX|NS] - resolve_expected: A semicolon-deliminated list of results whihc *must* be detected. - -http/https ----------- - http_text: Expected HTTP/HTTPS contents. - http_status: Expected HTTP/HTTPS response code. - -tcp ---- - banner Regular expression tested against the response from the remote TCP server. - - - -Bugs ----- - -Poke Steve - - |