This page is intended to provide a NMIS Device Troubleshooting Process to Identify bad behaviors in collection for NMIS8/9 products, you can break it down into clear steps that anyone can follow and identify what's wrong with the device collection also if we have Gaps In Graphs.

Device Troubleshooting Process

Diagramas de FlujoImage Added

Identify the problem. The first step in troubleshooting a device issue is to identify the problem, you have to consider if the issue is in NMIS8 or NMIS9 products.
1. Add to the support the case the product version and the servers/devices/models involved.
What kind of problem are you observing. A device issue can be affected for the next reasons.
1. Network performance, latency in the network, layer 1,2, and 3 issues.
2. Device configuration, connectivity, SNMP configuration, and others.
3. Server hardware requirements, high resource utilization parameters in the server.
4. Server configuration options, missing configuration items for server tunning.
5. Disk performance, slow write/read times for the device collection.

Gather information, collect all the graphs, images, behaviors that can explain what the problem is.

Collect support tool files The Opmantek Support Tool

Execute the collect command for the support tool

Code Block

#General collection.
/usr/local/nmis8/admin/support.pl action=collect  

#If the file is big, we can add the next parameter.
/usr/local/nmis8/admin/support.pl action=collect maxzipsize=900000000

#Device collection.
/usr/local/nmis8/admin/support.pl action=collect node=<node_name> maxzipsize=900000000

If you are using NMIS8, provide the /usr/local/nmis8/var files

go to /usr/local/nmis8/var directory and collect the next files

Code Block
-rw-rw---- 1 nmis nmis 4292 Apr 5 18:26 <node_name>-node.json -rw-rw---- 1 nmis nmis 2695 Apr 5 18:26 <node_name>-view.json

obtain update/collect outputs this information will upload to the support case:

Code Block

/usr/local/nmis8/bin/nmis.pl type=update node=<node_name> model=true debug=9 force=true > /tmp/node_name_update_$(hostname).log
/usr/local/nmis8/bin/nmis.pl type=collect node=<node_name> model=true debug=9 force=true > /tmp/node_name_collect_$(hostname).log

Replicate the problem. If possible you have to define, what the steps are to replicate the problem.
Identify symptoms. To this point, you are able to see a specific problem and what the symptoms are.
Determinate if something has changed, is important to verify with your team if something has changed, a good way to see this behavior is monitoring the performance graph for devices and server

...

if the total runtime/collect time is too high, we need to adjust the collect parameters depending on the manager version you are using.

NMIS 8 Processes

The main NMIS 8 process is called from different cron jobs to run different operations: collect, update, summary, clean jobs, etc. As an example:

...

The ps command provides us with information about the processes of a Linux or Unix system.
Sometimes tasks can hang, go into a closed-loop, or stop responding. For other reasons, or they may continue to run, but gobble up too much CPU or RAM time, or behave in an equally antisocial manner. Sometimes tasks need to be removed as a mercy to everyone involved. The first step. Of course, it is to identify the process in question.

Processes in a "D" or uninterruptible sleep state are usually waiting on I/O.

Code Block

[root@nmisslvcc5 log]# ps -auxf | egrep " D| Z"
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      1563  0.1  0.0      0     0 ?        D    Mar17  10:47  \_ [jbd2/dm-2-8]
root      1565  0.0  0.0      0     0 ?        D    Mar17   0:43  \_ [jbd2/dm-3-8]
root      1615  0.3  0.0      0     0 ?        D    Mar17  39:26  \_ [flush-253:2]
root      1853  0.0  0.0  29764   736 ?        D<sl Mar17   0:04 auditd
root     17898  0.0  0.0 103320   872 pts/5    S+   12:20   0:00  |       \_ egrep  D| Z
apache   17856 91.0  0.2 205896 76212 ?        D    12:19   0:01  |   \_ /usr/bin/perl /usr/local/nmis8/
root     13417  0.6  0.8 565512 306812 ?       D    10:38   0:37  \_ opmantek.pl webserver             -
root     17833  9.8  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>
root     17838 10.3  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>
root     17842 10.6  0.0      0     0 ?        Z    12:19   0:00      \_ [opeventsd.pl] <defunct>

...

Versions Compared

Old Version 21

New Version 22

Key

Device Troubleshooting Process

NMIS 8 Processes

Page Comparison

Versions Compared

Old Version 21

New Version 22

Key

Device Troubleshooting Process

NMIS 8 Processes