General Troubleshooting Checklist
RESOURCES FOR TROUBLESHOOTING
ADDITIONAL RESOURCES
- Troubleshooting opFlow
- Troubleshooting Open AudIT (Comunity/Professional/Enterprise)
- The Opmantek Support Tool
- Simplify Large Scale NMIS/OMK Server Deployments with Domain Wide Standardization Script
TABLE OF CONTENTS
Lessons Learned from Support Cases
Does DNS function properly?
If not any daemon that's doing name resolution will be very slow. Verify the system has an FQDN and resolves to itself. Also check if it can resolve other hosts.
### Check the local systems fqdn screen [root@demo: ~]# hostname -f demo.opmantek.com ### can the local system resolve it's own hostname? screen [root@demo: ~]# dig +short demo.opmantek.com 192.168.88.44 ### Can the system resolve other hosts? screen [root@demo: ~]# dig +short freebsd.org 8.8.178.110
Why DNS is Important
NMIS/OMK applications expect DNS to work. Managing individual /etc/hosts files does not scale. opHA is one module in particular where this is critical. If the customer does not have a local DNS server for internal hosts consider running BIND on the NMIS Primary server, other NMIS/OMK servers can use it as a name server. This is not difficult to do and will save a lot of troubleshooting time moving forward.
Does the system have the correct time? Is it synced with a time server?
[nmis@demo var]$ ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== +cachens2.onqnet 13.64.159.31 3 u 426 1024 377 4.845 -0.126 0.458 +ec2-13-54-31-22 54.252.165.245 3 u 352 1024 377 18.036 1.540 1.008 -node01.au.verbn 192.12.19.20 2 u 514 1024 377 18.966 -16.530 1.176 *ntp3.syrahost.c 218.100.43.70 2 u 422 1024 377 63.642 -1.172 0.852 [nmis@demo var]$ date -u 2017. 02. 16. (?) 22:33:31 UTC
Compare the system UTC time with actual UTC time. A site such as https://time.is/UTC will show current UTC time.
If the system time is not correct it will result in a lot of problems.
- Time stamps not correct on events
- Graph data not correct
- Transactions with other systems fail
- User logs in, then is kicked back to the login screen; the browser cookie is expired because the server time and workstation time is outside the cookie lifespan.
Perl Modules
If NMIS or OMK applications can not locate a Perl module it may be missing or it may have the wrong file permissions. Also check directory file permissions.
OMK General
Node synchronization with NMIS
Generally customers trust the node data that NMIS learns dynamically and they use this to automatically update the node data for OMK applications. It's a good idea to install a cron job that automates this synchronization periodically. The following commands work well for opEvents and opConfig respectively.
/usr/local/omk/bin/opevents-cli.exe act=import_from_nmis [overwrite=0/1] [setstate=0/1] /usr/local/omk/bin/opconfig-cli.exe act=import_from_nmis [node=nodeX|nodes=nodeA,...] [overwrite=0/1]
Configuration Files
If it's suspected that a particular configuration file is causing a problem, one technique to isolate the problem follows.
- Backup the suspect configuration file
- Copy the default configuration file from omk/install into omk/conf
- Restart the associated daemons and test