Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

The Baseline Tool now ships with the latest versions of opCharts for NMIS9.

Why we need a Dynamic Baseline and Thresholding Tool

Forewarned is forearmed the poverb proverb goes, a quick google tells me "prior knowledge of possible dangers or problems gives one a tactical advantage".  The reason we want to baseline and threshold our data is so that we can receive alerts forewarning us of issues in our environment, so that we can act to resolve smaller issues before they become bigger.  Being proactive increases our Mean Time Between Failure.

If you are interested in accessing the Dynamic Baseline and Thresholding Tool, please Contact Usit is included with the latest versions of opCharts.

Types of Metrics

When analysing time series data you quickly start to identify a common trend in what you are seeing, you will find some metrics you are monitoring will be "stable" that is they will have very repeated patterns and change in a similar way over time, while other metrics will be more chaotic, with a discernible pattern difficult to identify.

...

In practicality this spike was brief and using the 15 minute threshold period (current is the average of the last 15 minutes) the value for calculating change would be 136 and the resulting change would be 36% so a Major event. The threshold period is dampening the spikes to remove brief changes and allow you to see changes which last longer.

Installing the Baseline Tool

Copy the file to the server and do the following, upgrading will be the same process.

...

Flatline Baseline

Supported from opCharts 3.6.1.

When a metric remains to the same level for an extended period, it is called a flatline detection. This means, the standard deviation is 0.

  • "threshold_period" : "-60 minutes" # Default -15 min
  • "threshold_std_deviation" :  0.001, # Or 0. It checks the standard deviation (stddev)
  • "threshold_exceeds" : 2, # Or ignored. If not set, it will create an event every time it detects a flatline.
  • "threshold_level" : "critical" # Or Major by default

Flatline example: 

Image Added

The first flatline would be detected just when threshold_std_deviation is 10 in the example.

Flatline example with threshold exceed: 

Image Added

Example:

Code Block
"ifInErrors" : {
    "baseline" : "flatline",
    "active" : "true",
    "metric" : "ifInErrors",
    "type" : "pkts_hc",
    "nodeModel" : "CiscoRouter|CatalystIOS|CiscoNXOS",
    "use_index" : "interface",
    "event" : "Proactive Output Discards (flatline)",
    "indexed" : "true",
    "threshold_std_deviation" : 0.001,
    "threshold_period" : "-60 minutes",
    "threshold_exceeds" : 20
  },

Simple Baseline

The simple baseline just detects when the average of a selected period raises a threshold level. 

  • threshold_period
  • levels

Example:

Image Added

Example:

Code Block
  "ifInErrors" : {
    "baseline" : "simplethreshold",
    "active" : "true",
    "metric" : "ifInErrors",
    "type" : "pkts_hc",
    "nodeModel" : "CiscoRouter|CatalystIOS|CiscoNXOS",
    "use_index" : "interface",
    "event" : "Proactive Output Discards (simplethreshold)",
    "indexed" : "true",
    "threshold_period" : "-120 minutes",
    "levels" : {
      "Warning" : 10,
      "Minor" : 20,
      "Major" : 30,
      "Critical" : 40,
      "Fatal" : 50
    }
  }, 

In the above graph, that would be a Fatal alert. 

Installing the Baseline Tool

The baseline tool is installed with recent versions of opCharts.

Working with the Dynamic Baseline and Thresholding Tool

...

Configuration of the baseline tool is done in the file /usr/local/omk/conf/Baseline.nmis json the default configuration should be installed when the tool is installed.

...

Here is what the configuration file would look like, this example is a Same-Day Baseline:

Code Block
  '"RouteNumber'" =>: {
    '"active'" =>: '"true'",
    '"metric'" =>: '"RouteNumber'",
    '"type'" =>: '"RouteNumber'",
    '"nodeModel'" =>: '"CiscoRouter'",
    '"event'" =>: '"Proactive Route Number Change'",
    '"indexed'" =>: '"false'",
    '"threshold_exceeds'" =>: undef,
    '"threshold_period'" =>: "-5 minutes",
    '"multiplier'" =>: 1,
    '"weeks'" =>: 0,
    '"hours'" =>: 8,
  },

Multi-Day Dynamic Baseline Configuration Example

Another configuration option using the BGP Prefixes being exchanged with BGP peers, is from systemHealth modelling and this is a multi-day baseline:

Code Block
  '"cbgpAcceptedPrefix'" =>: {
    '"active'" =>: '"true'",
    '"metric'" =>: '"cbgpAcceptedPrefix'",
    '"type'" =>: '"bgpPrefix'",
    '"section'" =>: '"bgpPrefix'",
    '"nodeModel'" =>: '"CircuitMonitor|CiscoRouter'",
    '"event'" =>: '"Proactive BGP Peer Prefix Change'",
    '"indexed'" =>: '"true'",
    '"multiplier'" =>: 1,
    '"weeks'" =>: 4,
    '"hours'" =>: 1,
  },

Delta Baseline Configuration Example

Currently delta baselines do not support multi-day, but the hours value can be very large if required.

Code Block
  '"hrSystemProcesses'" =>: {
    '"baseline'" =>: '"delta'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"hrSystemProcesses'",
    '"type'" =>: '"Host_Health'",
    '"nodeModel'" =>: '"net-snmp'",
    '"indexed'" =>: '"false'",
    '"hours'" =>: 4,
    '"threshold_period'" =>: "-15 minutes",
    '"levels'" =>: {
      '"Warning'" =>: 10,
      '"Minor'" =>: 20,
      '"Major'" =>: 30,
      '"Critical'" =>: 40,
      '"Fatal'" =>: 50
    }
  },

Delta Baseline for Output Packets Discarded Configuration Example

Currently delta baselines do not support multi-day, but the hours value can be very large if required.

Code Block
  '"ifOutDiscards'" =>: {
    '"baseline'" =>: '"delta'",
    '"active'" =>: '"true'",
    '"metric'" =>: '"ifOutDiscards'",
    '"type'" =>: '"pkts_hc'",
    '"use_index'" =>: '"interface'",
    '"nodeModel'" =>: 'CiscoRouter'",
    '"event'" =>: '"Proactive Output Discards (Delta)'",
    '"indexed'" =>: '"true'",
    '"hours'" =>: 1,
    '"threshold_period'" =>: "-15 minutes",
    '"levels'" =>: {
      'Warning'" =>: 1,
      'Minor'" =>: 2,
      'Major'" =>: 3,
      'Critical'" =>: 4,
      'Fatal'" =>: 7
    }
  },

Running the Baseline Tool

...

Code Block
/usr/local/omk/bin/baseline.plexe act=run

There are some debug options to see a little more detail, debug=true, debug=2 or debug=3 are the current levels of verbosity.

...

Code Block
#
# this cron schedule runs the baseline system every 5 minutes.
#
#
# if you DON'T want any NMIS cron mails to go to root, 
# uncomment and adjust the next line
#MAILTO=prefered@domain.com
#
# m h dom month dow user command
#
# run the baseline every 5 minutes starting at 4 minutes offset from the hour.
4-59/5 * * * * root "/usr/local/omk/bin/baseline.exe" act=run > "/usr/local/omk/log/baseline.log" 2>&1

Using Group Regex and Cron for Parallel Processing.

...