Table of Contents |
---|
opEvents provides two mechanisms to handle repeated event occurrences in a practical fashion, namely stateful event deduplication and programmable event suppression.
Stateful Deduplication and Flaps
All events that are related to stateful entities (e.g. a node which can be in state up or down, an interface etc.) are automatically checked against the recent history of events and the known previous state of this entity. If the new event reports the same state as the already known one, then the new event is suppressed completely: no event record is created (except for raw logging, if that is enabled).
...
Related to that is the concept of a Flap, which in opEvents is defined as a transition sequence of from state up to down and back up transitions within a short time frame. opEvents uses the configuration option state_flap_window
to define this window, by default 90 seconds.
In a flap situation , the up event is marked as flap (by setting the flap
property to 1) and as associated with the down event , and its event name is (using the eventids
or stateful_eventids
property). In versions up to 2.2.1, the up event's name is always changed to "<state entity> Flap
"; it is also marked as associated to the previous down event, and any repeat events that don't convey a new state are suppressed.
This behaviour newer versions of opEvents support the config option opevents_flap_name
, which lets you specify a template (which can contain node.X
, event.Y
and macro.Z
placeholders, e.g. "event.event for event.stateful - Flap"
).
The interaction between down and up events in a flap situation can be fine-tuned using the configuration option opevents_no_action_on_flap
(default: "true"): when .
- When set to "true" opEvents will automatically acknowledge the related down event and set the down event's action_required to false. This causes any actions defined in policies for the down event to be stopped
...
- (including escalation actions). The down event is thus closed and disposed of on receiving the up event.
- On the other hand, if
opevents_no_action_on_flap
is false, then the down event is not modified in any way and remains open when a flap is detected; it is thus trackable independent of the up event.
Involved Event Properties
...
For state tracking opEvents then combines the node name and the values of stateful
and element
into a lookup key, and associates that key with the state
value.
Any repeat events with the same lookup key and the same state value are ignored.
Stateful Deduplication, Forwarded Events and Reorder Protection
Anchor | ||||
---|---|---|---|---|
|
If you use an Event Action or Escalation Policy with create_remote_events
to forward events to another opEvents server elsewhere, then you might occasionally find that such forwarded events arrive out of order, i.e. an earlier 'down' event might be received after the later 'up' event. This can happen because of network congestion, action processing on the sending side being asynchronous and subject to process limits and similar reasons.
Out of order reception of stateful events can cause state desynchronisation at the receiving server, as the up event would be processed first and thus be deduplicated and discarded, while the down event later on causes a transition to state down which isn't cleared.
opEvents versions 2.4.2 and newer provide a reorder protection mechanism to handle such out of order situations better - which comes at the cost of temporarily delaying the processing of some forwarded stateful events.
To enable reorder protection, two steps need to be taken:
- you need to set the configuration property
state_reorder_window
to a positive number (e.g. 30) on the receiving server, - and you must make sure that your forwarded events do carry an
authority
property, to denote the event as originating from a remote authoritative source.
If both of these conditions are met, opEvents on the receiving server will temporarily postpone processing of a forwarded stateful event, if the event would be discarded by stateful deduplication.
This allows earlier but externally delayed related events to enter the processing queue in the correct sequence, if any such do arrive within the configured time window after the out-of-order postponed event.
If a state-changing remote event does arrive within the time window configured by state_reorder_window
, then the correct sequencing of transitions is restored and processing of postponed events resumes immediately. Otherwise, processing resumes after the time window elapses.
The state_reorder_window
should not be set too large as it causes undesirable event processing delays; a value of 10 to 30 seconds should suffice in most environments.
Programmable Suppression
...