Table of Contents |
---|
Sometimes with NMIS and Network Management in general, you get these funny products, like wierd devices which don't really conform to the best practices and standards for SNMP. They can be a pain to troubleshoot. Here are some tips for things we have found.
SNMP Working, but not finding Interfaces with ifIndex
When running an NMIS update, e.g. nmis.pl type=update node=NODENAME debug=true, you might stop at this line "SNMP ERROR" see below:
Code Block |
---|
01:35:35 getIntfInfo, Get Interface Info of node STRANGENODENAME, model TELDAT 01:36:23 checkResult, SNMP ERROR (STRANGENODENAME) (ifIndex) No response from remote host "STRANGENODENAME" 01:36:23 getIntfInfo, ERROR (STRANGENODENAME) on get interface index table 01:36:23 notify, Start of Notify 01:36:23 eventAdd, event added, node=STRANGENODENAME, event=SNMP Down, level=Critical, element=, details=SNMP error |
This looks odd because SNMP is working, but this very important operation is failing. So the problem is likely to be with support for the maximum SNMP packet size which is controlled by something called max repetition, which is actually how many SNMP PDU's will be packed into the SNMP packet.
So to troubleshoot the above you might run an SNMPWALK like this:
Code Block |
---|
NMIS# snmpwalk -v 2c -c COMMUNITYSTRING STRANGENODENAME ifIndex IF-MIB::ifIndex.1 = INTEGER: 1 IF-MIB::ifIndex.2 = INTEGER: 2 IF-MIB::ifIndex.3 = INTEGER: 3 IF-MIB::ifIndex.4 = INTEGER: 4 IF-MIB::ifIndex.5 = INTEGER: 5 IF-MIB::ifIndex.6 = INTEGER: 6 IF-MIB::ifIndex.7 = INTEGER: 7 IF-MIB::ifIndex.8 = INTEGER: 8 IF-MIB::ifIndex.9 = INTEGER: 9 IF-MIB::ifIndex.10 = INTEGER: 10 IF-MIB::ifIndex.11 = INTEGER: 11 IF-MIB::ifIndex.12 = INTEGER: 12 |
If you ran a TCP DUMP which you would run with this command, you will need to make sure you are using TCPDUMP on the interface you are sending packets out of, check the route table on the server if you have multiple interfaces:
Code Block |
---|
tcpdump -i INTERFACE host 2.3.4.5 |
You would see this:
Code Block |
---|
01:31:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp: C=COMMUNITYSTRING GetBulk(29) N=0 M=10 interfaces.ifTable.ifEntry.ifIndex 01:31:37.115557 IP 2.3.4.5.snmp > 1.2.3.4.48560: C=COMMUNITYSTRING GetResponse(185) interfaces.ifTable.ifEntry.ifIndex.1=1 interfaces.ifTable.ifEntry.ifIndex.2=2 interfaces.ifTable.ifEntry.ifIndex.3=3 interfaces.ifTable.ifEntry.ifIndex.4=4 interfaces.ifTable.ifEntry.ifIndex.5=5 interfaces.ifTable.ifEntry.ifIndex.6=6 interfaces.ifTable.ifEntry.ifIndex.7=7 interfaces.ifTable.ifEntry.ifIndex.8=8 interfaces.ifTable.ifEntry.ifIndex.9=9 interfaces.ifTable.ifEntry.ifIndex.10=10 01:31:37.116194 IP 1.2.3.4.48560 > 2.3.4.5.snmp: C=COMMUNITYSTRING GetBulk(30) N=0 M=10 interfaces.ifTable.ifEntry.ifIndex.10 01:31:37.139792 IP 2.3.4.5.snmp > 1.2.3.4.48560: C=COMMUNITYSTRING GetResponse(241) interfaces.ifTable.ifEntry.ifIndex.11=11 interfaces.ifTable.ifEntry.ifIndex.12=12 interfaces.ifTable.ifEntry.ifDescr.1="ethernet0/0" interfaces.ifTable.ifEntry.ifDescr.2="ethernet0/1" interfaces.ifTable.ifEntry.ifDescr.3="serial0/0" interfaces.ifTable.ifEntry.ifDescr.4="bri0/0" interfaces.ifTable.ifEntry.ifDescr.5="x25-node" interfaces.ifTable.ifEntry.ifDescr.6="voip1/0" interfaces.ifTable.ifEntry.ifDescr.7="serial2/0" interfaces.ifTable.ifEntry.ifDescr.8="fr2" |
What is interesting here is this: GetBulk(29) N=0 M=10 interfaces.ifTable.ifEntry.ifIndex, this is using a maximum of 10 SNMP PDU's in a packet, NET-SNMP on the command line appears to use 10 as a default OR not use bulk walks.
If you have not configured max repetitions in NMIS, you would see this:
Code Block |
---|
01:41:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp: C=COMMUNITYSTRING GetBulk(29) N=0 M=25 interfaces.ifTable.ifEntry.ifIndex 01:51:37.093751 IP 1.2.3.4.48560 > 2.3.4.5.snmp: C=COMMUNITYSTRING GetBulk(29) N=0 M=25 interfaces.ifTable.ifEntry.ifIndex |
Then NMIS would give you the errors above. This is using a default of M=25 which set in the Perl NET-SNMP libraries or somewhere even more obscure.
Net Result, you will need to configure your NMIS Node with
'max_repetitions' => ’10',
You can find more details about SNMP things @ SNMP Tuning
snmpd returns "invalid(4)" process state (hrSWRunStatus) for process names containing spaces
When querying the hrSWRunStatus table via SNMP when using snmpd, it should generally return 1 or 2 for processes that are running or runnable.
However, if the process name contains a space, snmpd return 4 (invalid) for the process state.
This appears to be because it's reading /proc/$PID/stat and simply splitting on space and then grabbing the third element,
which would normally be the process status, but when the process name contains a space, this is no longer true.
net-snmp version 5.7.2 is known to be affected:
https://bugzilla.redhat.com/show_bug.cgi?id=1782180
A consequence of this issue is that NMIS will report a monitored service as 'down' when it is 'running' if the process name contains a space.