Monitor2: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
[[File:monitor2_a.png|right|480px]] | [[File:monitor2_a.png|right|480px|Typical graph page]] | ||
The second monitoring system/framework in the [[New World]] | The second monitoring system/framework in the [[New World]], '''monitor2''', is implemented on the server https://nw-syd-monitor2.cse.unsw.edu.au/. Unlike [[monitor1]] it does not use Nagios and does not generate any alarms or warnings. | ||
Instead, monitor2 collects and graphs data sampled from various servers. It is designed to implement a simple but flexible way of collecting data from these hosts, storing | Instead, monitor2 collects and graphs data sampled from various servers. It is designed to implement a simple but flexible way of collecting data from these hosts, storing that data and graphing it, regardless of the data sources. It uses SNMP to collect data samples from the monitored hosts. Where the standard SNMP MIBs don't define the desired data, SNMP is extended with external scripts (currently written in bash for ease of portability and maintenance) to provide the required samples. | ||
As of the date of writing, it supports: | As of the date of writing, it supports: | ||
# Disk activity (bytes read and written per local disk | # Disk activity (bytes read and written per local disk and partition) | ||
# Network interface traffic (bytes read and written per network interface) | # Network interface traffic (bytes read and written per network interface) | ||
# CPU usage (percentage) and load average | # CPU usage (percentage) and load average | ||
Line 13: | Line 13: | ||
# Device temperature(s) | # Device temperature(s) | ||
== | Monitored hosts only require: | ||
* That the abovementioned extension scripts be copied into place (see file locations below), | |||
* That a custom <code>snmpd.conf</code> be copied into place, and | |||
* That <code>snmpd</code> be installed and started/run via <code>systemd</code>.. | |||
== File locations == | |||
=== monitor2 === | === monitor2 === |
Revision as of 10:43, 27 April 2023
The second monitoring system/framework in the New World, monitor2, is implemented on the server https://nw-syd-monitor2.cse.unsw.edu.au/. Unlike monitor1 it does not use Nagios and does not generate any alarms or warnings.
Instead, monitor2 collects and graphs data sampled from various servers. It is designed to implement a simple but flexible way of collecting data from these hosts, storing that data and graphing it, regardless of the data sources. It uses SNMP to collect data samples from the monitored hosts. Where the standard SNMP MIBs don't define the desired data, SNMP is extended with external scripts (currently written in bash for ease of portability and maintenance) to provide the required samples.
As of the date of writing, it supports:
- Disk activity (bytes read and written per local disk and partition)
- Network interface traffic (bytes read and written per network interface)
- CPU usage (percentage) and load average
- Memory usage (RAM)
- Device temperature(s)
Monitored hosts only require:
- That the abovementioned extension scripts be copied into place (see file locations below),
- That a custom
snmpd.conf
be copied into place, and - That
snmpd
be installed and started/run viasystemd
..
File locations
monitor2
monitor2 files and directories | Description |
---|---|
/etc/monitor2.conf
|
Site-specific configuration: top-level directory location, etc. |
/home/cephmonitor/samples
|
Top level directory under which, firstly, sampled/graphed hosts are collected in plot pool directories |
/home/cephmonitor/cgi-bin
|
Directory containing Bash CGI scripts for each graphed data type. They use a common library to display pages with similar-looking controls at the top and graphs underneath |
/home/cephmonitor/plot-bin
|
? |
/home/cephmonitor/html
|
Static HTML pages, including index.html which contains links to the CGI scripts
|
/home/cephmonitor/bin
|
? |
/home/cephmonitor/datacollectpoll
|
Directory containing the datacollectpoll |
/etc/systemd/system/datacollectpoll.service
|
systemd service file used to control our datacollectpoll service
|
Monitored hosts
Files and directories on the monitored hosts | Description |
---|---|
/etc/snmp/snmpd.conf
|
Configuration file for the SNMP daemon running on the host. It contains the community name plus it lists the scripts used to extend the range of data the daemon can provide |
/usr/local/snmpd_extend
|
Directory containing the extension scripts referred to by snmpd.conf
|
monitor2.conf
SAMPLERETRIES=3 SAMPLETIMEOUT=2 SNMPCOMMUNITY="reader" DEFAULTPLOTPOOL="ceph" DATADIR="/home/cephmonitor/samples" DATACOLLECTPOLLCONF="/home/cephmonitor/datacollectpoll/datacollectpoll.conf" |
datacollectpoll.service
[Unit] Description=Data Collect Poll daemon for fleet monitoring After=network.target [Service] User=cephmonitor Group=cephmonitor EnvironmentFile=/etc/monitor2.conf ExecStart=/home/cephmonitor/datacollectpoll/datacollectpoll [Install] WantedBy=multi-user.target |
datacollectpoll
Immortal script (i.e., never intentionally dies) run by systemd which runs commands at intervals listed in datacollectpoll.conf to collect data samples and store them in plain-text files in the samples directory.
samples
Directory and subdirectories with plain-text files containing daily collections of data samples named by type and host.
datacollectpoll.conf
Format:
- Blank lines are ignored.
- '#' to end of line is a comment
- First space-separated field on the line is the command to run. This must exist at the time the file is read
- Subsequent space-separated fields are optional arguments passed to the aforementioned command. Although optional format-wise, the existing scripts take two arguments:
- The poll group
- The host to sample
- An optional ';' followed by an interval in seconds change the time between samples from XX to the specified interval
/home/cephmonitor/bin/get_disk_stats vmhost node3 ; 30
/home/cephmonitor/bin/get_disk_stats vmhost node4 ; 30
/home/cephmonitor/bin/get_disk_stats vmhost node6 ; 30
/home/cephmonitor/bin/get_disk_stats vmhost node9 ; 30
/home/cephmonitor/bin/get_temperature_stats ceph node1 ; 120
/home/cephmonitor/bin/get_temperature_stats ceph node2 ; 120
/home/cephmonitor/bin/get_temperature_stats ceph node5 ; 120
/home/cephmonitor/bin/get_temperature_stats ceph node7 ; 120
/home/cephmonitor/bin/get_temperature_stats ceph odroidn2 ; 120
/home/cephmonitor/bin/get_temperature_stats ceph rockpro64 ; 120
...
Plot pools