Table of Contents |
---|
...
Forewarned is forearmed the poverb goes, a quick google tells me "prior knowledge of possible dangers or problems gives one a tactical advantage". The reason we want to baseline and threshold our data is so that we can receive alerts forewarning us of issues in our environment, so that we can act to resolve smaller issues before they become bigger. Being proactive increases our Mean Time Between Failure.
...
Overall this is what opTrend does the . The sophisticated statistical model it builds is very powerful and helps spots these trends with the baseline tool, we . We have extended opTrend with some additional functionality so that you can quickly get alerts from metrics which are important to you.
...
Firstly I want to calculate my current value, I could use the last value collected, but depending on the stability of the metric this might cause false positives, as NMIS has always supported, using a larger threshold period when calculating the current value can result in more interesting relevant results.
For very stable metrics using a small threshold period is no problem, but for wilder values, a longer period is advised. For response time alerting, using a threshold period of 15 minutes or great greater would be a good idea, that . That means that there is some sustained issue and not just a one off internet blip. However However with our route number we might be very happy to use the last value and get warned sooner.
...
With the average of each of these windows of time calculated, I can now build my baseline and compare my current value against that baseline's value.
Same-Day Baseline
Depending on the stability of the metric it might be preferable to use the data from that day. For example if you had a rising and falling value It might be preferable to use just the last 4 to 8 hours of the day for your baseline. Take this interface traffic as an example, the input rate while the output rate is stable with a sudden plateau and is then stable again.
...
Configuration of the baseline tool is done in the file /usr/local/omk/conf/Baseline.nmis the default configuration should be installed when the tool is installed.
Configuration Option | Description | Example |
---|---|---|
active | Is baselining this metric active or not, values are true or false | true |
metric | Which NMIS datapoint or variable, equates to an RRD DS | RouteNumber |
type | Which NMIS model section or metric | RouteNumber |
section | What is the section name in the node info, just run it, otherwise the section must exist. | |
nodeModel | This is a regex which defines which NMIS models should be matched | CiscoRouter |
event | The name of the event to use, will default to Proactive Baseline type metric if none provided. | Proactive Route Number Change |
indexed | Is this variable indexed or not | false |
threshold_exceeds | Ignored if undef otherwise the value must ALSO exceed this threshold to raise an event | undef |
threshold_period | How many minutes should the value to be baselined be averaged, e.g. -5 minutes is the last poll, -15 minutes would be the average of the last 15 minutes, -1 hour would be the last 60 minutes. | -5 minutes |
multiplier | How many standard deviations to vary the baseline by. | 1 |
weeks | The number of weeks to look back | 0 |
hours | The number of hours to include in the baseline metrics | 8 |
Here is what the configuration file would look like, this example is a Same-Day Baseline:
...