summaryrefslogtreecommitdiff
path: root/README
blob: e2c698e29abce9c25fd60749918ea692006bed7c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Source:
    https://projects.bytemark.co.uk/projects/custodian

Copyright:
    Copyright (c) 2012 Bytemark Computer Consulting Ltd

Licence:
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.




About Custodian
---------------

Custodian is a simple, scalable, and reliable protocol-tester that allows
a number of services to be tested across a network.

The core design is based upon a work queue, which has manipulated by
two main scripts:

  custodian-enqueue
    * A parser that reads a list of hosts and tests to apply.  These
      tests are broken down into individual jobs, serialized and stored
      in a central queue.

  custodian-dequeue
    * A tool that pulls jobs from the queue, executing them in turn, and
      raises/clears alerts based upon the result of the test.




Implementation
--------------

  In brief we accept four distinct kinds of line:


  1. Comments
  ------------
  Comments are lines that are blank or which begin with the comment-character ("#").


  2. Macro Definitions
  ---------------------
  There are three types of macros:

     FOO_HOSTS is 1.2.3.4 and 2.3.4.5 and 4.5.6.6.
     FOO_HOSTS are 1.2.3.4 and 2.3.4.5 and 4.5.6.6.
     FOO_HOSTS are fetched from https://admin.bytemark.co.uk/network/monitor_ips/routers.

  We accept each of these, with the caveat that macro-names must match
  the regular expression ^[0-9A-Z_]$.


  3.  Service Tests
  -----------------
  Service tests are best explained by several examples:

     SWITCHES must run ssh otherwise 'Bytemark networking infrastructure: switch'.
     mirror.bytemark.co.uk must run ftp on 21 otherwise 'Bytemark Mirror: FTP failure'.

  The general case is:

     hostname|macro must run XXX [on NN] otherwise 'alert'.

  If we restrict ourself to saying that every test must be named by the service that is
  under test then we can generalize them.  This means we'll invoke the ftp-handler for

     foo.vm must run ftp otherwise 'alert text'.

  The bar-handler for the line:

     example.vm.bytemark.co.uk must run bar otherwise 'alert text'.

  The JSON which we serialize will also have "test_type:ftp", and "test_type:bar", respectively.


  4. Ping Tests
  -------------
  Ping tests are of the form:

     FOO must ping otherwise 'alert text'.
     example.vm.bytemark.co.uk must ping otherwise 'alert text'.

  These are a simplification of the service tests, because the only real difference
  is that we write "must ping" rather than "must run ping" - to that end we silently
  rewrite any line which reads:

    (.*) must ping (.*)

  This becomes:

    $1 must run ping $2

  This allows the line to be parsed by the previous service-test rules.