Performance data export pipeline
All performance data exported from NMIS runs through a pipeline from NMIS to MySQL database.
nmis_performance_export.pl grabs data from different parts of NMIS (-node files, RRD's, etc) and creates one file per node per table. One file per 5 minute period (per node pertable) is created, if you run the command more than once it will simply replace the existing files. These files are placed in a directory corresponding to the 5 minute period.
- An opExport pull request is made on the MySQL server
- The NMIS server omkd find the oldest files that have been exported ('opexport_max_performance_datasets_per_load' => 3, of them), loads them and sends them back to the MySQL omkd.
- omkd on the MySQL server saves contents to MySQL
- If no errors occured omkd on MySQL sends a recepit back to omkd on NMIS saying all ok
- omk on NMIS logs response to opExport.log and removes succesfully inserted files
Null data in performance tables
Given the definition of the performance data export pipeline, the logical place to start looking when null values appear is at step 1. Find a file in /usr/local/omk/var/perf/<time_period>/<table>-<time_period>-node.nmis, view it in a text editor (like vi).
Is the null column name you are looking for defined in the file? (note, column names in this file can be re-mapped to different column names using the schema, so double check the schema for the name you should be looking for if you can't find it)
- No: If it is not defined it's likely that the model that the node uses either does not support the data or the export script hasn't been modified to add that information to the export (likely because it might not make logical sense to want to track that data for that model, eg, tracking cpu performance on your UPS)
- Yes: this means the data that was grabbed was not defined.
- Is thresholding turned on for this node? Check Nodes.nmis, search for the node name, look for 'threshold' => 'true'. To count the number enabled and disabled run these commands:
- grep -c "'threshold' => 'true'" /usr/local/nmis8/conf/Nodes.nmis
- grep -c "'threshold' => 'false'" /usr/local/nmis8/conf/Nodes.nmis
- Is NMIS running thresholding (either 'threshold_poll_cycle' => 'true', or with a separate nmis.pl type=threshold)?
- Can you find the value in the var/<node_name>-node.nmis file for the node?
- Is thresholding turned on for this node? Check Nodes.nmis, search for the node name, look for 'threshold' => 'true'. To count the number enabled and disabled run these commands:
- Run a pull or push of the data manually from the GUI (not a pull request, an actual pull) may give you info on any errors that are occurring.
- Do the schemas match? You could try pushing the schemas to the server again and then run push/pull
- If all this works successfully double check the logs on both the MySQL server (the omkd logs as well as the MySQL daemon logs)
Some Data Not Updating
The symptoms were seen that some data was not updating, e.g. data for a particular table was not updating, server1, nodeStatus. The data is copied from the opExport slaves, to the opExport DB server and stored temporalily in the location /usr/local/omk/var/save_queue, before the file is loaded it is cached in the sub-directory /usr/local/omk/var/save_queue/data
You can check the receipts for the data in /usr/local/omk/var, then ls -1 *SERVERIP* you will see this.
-rw------- 1 root 505 160533 Dec 19 17:53 receipt-SERVERIP_cbqosPerformance -rw------- 1 root 505 111295 Dec 19 17:52 receipt-SERVERIP_ciscoConfig -rw------- 1 root 505 111386 Dec 19 17:51 receipt-SERVERIP_diskIOTable -rw------- 1 root 505 111292 Dec 19 17:55 receipt-SERVERIP_interface -rw------- 1 root 505 171639 Dec 19 17:51 receipt-SERVERIP_interfacePerformance -rw------- 1 root 505 111298 Dec 19 17:51 receipt-SERVERIP_interfaceStatus -rw------- 1 root 505 300 Dec 19 17:53 receipt-SERVERIP_ipslaPerformance -rw------- 1 root 505 111294 Dec 19 17:52 receipt-SERVERIP_nmisConfig -rw------- 1 root 505 111298 Dec 19 17:52 receipt-SERVERIP_nodeProperties -rw------- 1 root 505 205 Dec 19 17:52 receipt-SERVERIP_nodes -rw------- 1 root 505 222386 Dec 19 17:54 receipt-SERVERIP_nodeStatus -rw------- 1 root 505 111383 Dec 19 17:53 receipt-SERVERIP_services -rw------- 1 root 505 111382 Dec 19 17:50 receipt-SERVERIP_storage -rw------- 1 root 505 111290 Dec 19 17:52 receipt-SERVERIP_system -rw------- 1 root 505 494335 Dec 19 17:53 receipt-SERVERIP_systemPerformance -rw------- 1 root 505 298 Dec 19 17:52 receipt-SERVERIP_upsPerformance
If any of these dates is not current time (within last 5 minutes), then
Files were found in this folder which prevented opExport from streaming new data, moving the file, e.g. stream-nodeStatus-SERVERIP-localhost.data out of the way, means that opExport can get back to business.
It is likely these files are left behind when a process times-out or fails before the stream data has been processed.