NMIS has been optimised to provide the critical information people need about the network. This means that it wants to keep all the information about your network up to date. As networks have gotten larger and larger in the last few years and more and more network processing is done in software, we have noticed a trend to devices with VERY high interface counts. Lately Opmantek has been seeing interface counts on single devices of over 100000 (one hundred thousand).
A key principle of NMIS has always been to filter out the noise and only provide information which is going to be important, e.g. primary links of networks, not every link to every user, and our customers agree that this is the primary focus of what they want to do. From these devices they want to manage with over 100K interfaces they are often only interested in performance data from 1000 or less interfaces. This article is going to cover some of the considerations and features in NMIS which will help you handle nodes with high interface counts.
NMIS Principles for Interface Collection
NMIS is working to provide a balance of enough information to manage your network without requiring too much information that you need 1 terabyte per day to store the data or 32 cores and 32 GB to poll all the interfaces. NMIS keeps 7 days of granular data and this can be configured to be as much as you like, discussed in Amount of Performance Data Storage NMIS8 Stores, this has been found to be a good balance of important operational information and disk usage. To do this the NMIS models have various settings and there are global configuration options which override the per model settings.
For example, by default, on most devices, NMIS will not collect data from an interface unless it has a description, e.g. if a Cisco Switch has 500 interfaces, NMIS would only collect the interfaces which have descriptions. By default, NMIS will not collect certain interface types, this is because we have found that many interface types do not contain all the data in the MIB, so why collect them.
Problems with SNMP Agents
NMIS talks to 1000's of different vendor products, and some of those devices have less than well performing SNMP agents. When you have a slow SNMP agent, along with a high interface count, the result can be very slow polling. For example we have found some SNMP agents which have 1000's of interfaces and an NMIS update takes less than 30 seconds, while other devices have taken over 60 minutes to run an update.
NMIS Configuration Options
interface_max_number
NMIS can handle devices with very high device counts, but we have noticed that people don't really know that they have these devices or how many interfaces they have. Because managing 1000's of interfaces could fill the disk or prevent effective polling, the interface_max_number configuration option was added in NMIS 8.5.8G, the default setting is 5000.
This means that if NMIS finds a device with an interface count over the configured setting, it will NOT collect interface data from that node at all. This means that the NMIS administrator has to consciously enable handling of devices which high interface counts and they can monitor how well the device and NMIS are handling collecting the data.
update_use_ifTableLastChange
This configuration option was added in NMIS 8.5.10. The SNMP variable ifTableLastChange holds the value of sysUpTime when the interface table last changed, that is when an interface was added or deleted. The default value for this is false. When set to true the NMIS update process will check this value for each node and only run a complete update of the interface table when this value has changed. For a node with high interface counts, this can result in not needing to run an update very often at all. Which tends to work well as these devices do not often have new interface being added.
Model custom - interface - ifAdminStatus
Every poll cycle NMIS is checking the ifAdminStatus and ifOperStatus to see if interfaces have changed state or not. This is not a problem with regular nodes, but when slow SNMP agents combine with high interface counts, this can take extra time. In the model for the node you can disable this processing by adding a top level modelling section as below. NMIS will still check the interface status of interfaces being collected but will not look for interfaces which have changed state.
'custom' => { 'interface' => { 'ifAdminStatus' => 'false', } },
Polling Locks
This feature was added in NMIS 8.5.10. Polling locks work that when an update or collect poll runs on a node, a lock is created, preventing another NMIS process from starting an update or collect. This is done because an update on a node with high interface counts can go for quite a while, we don't want another update running on the node at the same time. We also don't want NMIS running a collect on a node which will start an update if an update has never been run, and the server might get caught with too many blocked processes.