There are lots of factors that determine the system health of a server. The hardware capabilities - CPU, memory or disk - is an important one, but also the server load - number of devices (Nodes to be polled, updated, audited, synchronised), number of products (NMIS, OAE, opCharts, opHA - each running different processes), number of concurrent users.
We all want the best performance for a server, and to optimise physical resources, our configuration has to be fine-grained adjusted. In this guide you will find recommended parameters, that may not suit in all cases, as a server performance will depend on a lot of factors.
Table of Contents |
---|
Related Articles
- Scaling NMIS Polling
- Scaling NMIS polling - how NMIS handles long running processes
- NMIS 8 - Configuration Options for Server Performance Tuning
- NMIS 9 - Configuration Options for Server Performance Tuning
- opCharts 3 Performance Tuning
Opmantek Applications
This article configurations are related to Opmantek products. opCharts, opEvents, opConfig, opHA, opReports, ... all use the omkd daemon which servers the frontend requests. Also, opEvents, opCharts and opConfig have their own daemons.
Before Start
The first thing to do will be get the information of out our system:
- System Information: NMIS and OMK support tool will give us all the information needed.
- Monitor services: NMIS can monitor the involved processes - apache2, nmis9d, omkd and mongod - and provide useful information about CPU and memory - among others.
Status How to monitor these servicescolour Yellow title TODO
Number of processes
NMIS runs a daemon to obtain periodically the nodes information.
The number of workers is set in the parameter:
Code Block |
---|
nmisd_max_workers |
By default 10.
OMK has the equivalent parameter:
Configuration items
In low memory environments lowering the number of omkd workers provides the biggest improvement in stability, even more than tuning mongod.conf does. The default value is 10, but in an environment with low users concurrency it can be decreased to 3-5.
Code Block |
---|
omkd_workers |
Setting also omkd_max_requests, will help to have the threads restart gracefully before they get too big.
Code Block |
---|
omkd_max_requests |
Process size safety limiter: if a max is configured and it's >= 256 mb and we're on linux, then run a process size check every 15 s and gracefully shut down the worker if over size.
Code Block |
---|
omkd_max_memory |
Process maximum number of concurrent connections, defaults to 1000:
Code Block |
---|
omkd_max_clients |
The performance logs are really useful for debugging purposes, but they also can affect performance. So, it is recommended to turn them off when they are not necessary:
Code Block |
---|
omkd_performance_logs => false |
MongoDB memory usage
MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size.
MongoDB cache size can be changed by adding the cacheSizeGB argument to the /etc/mongod.conf configuration file, as shown below.
Code Block |
---|
storage: dbPath: /var/lib/mongodb journal: enabled: true wiredTiger: engineConfig: cacheSizeGB: 1 |
Here is an interesting information regarding how MongoDB reserves memory for internal cache and WiredTiger, the underneath technology. Also some adjustment that can be done: https://dba.stackexchange.com/questions/148395/mongodb-using-too-much-memory
Server examples
Two servers are compared in this section.
- Master Primary only have one node, but more than 400 poller nodes. opHA process is what will require more CPU and memory usage.
- Poller have more more than 500 nodes. nmis process will require more CPU and memory, for polling the information for all the nodes.
Stressed system
Status | ||||
---|---|---|---|---|
|
System information:
Name | Value | Notes |
---|---|---|
nmisd_max_workers | 10 | (nmis9 only) |
omkd_workers | 4 | |
omkd_max_requests | 500 | |
Nodes | 406 | |
Active Nodes | 507 | |
OS | Ubuntu 18.04.3 LTS | |
role | poller |
This is how the server memory graphs looks in a stressed system - We will be focused focus on the memory as it this is where the bottleneck is:
NMIS process keeps remains stable, is not using more than 120 mb, and the process was stopped - probably killed for the system due to high memory usage:
Status | ||||
---|---|---|---|---|
|
...
Check processes once nmis9d is restarted again:
Code Block |
---|
top |
...
Healthy system
Status | ||||
---|---|---|---|---|
|
System information:
Name | Value |
---|---|
nmisd_max_workers | 5 |
omkd_workers | 10 |
omkd_max_requests | undef |
Nodes | 2 |
Poller Nodes | 536 |
OS | Ubuntu 18.04.3 LTS |
role | master |
This is how the server memory graphs looks in a normal system:
...
Daemons graphs:
omk:
mongo:
Stressed system
Status | ||||||
---|---|---|---|---|---|---|
|
System information:
Name | Value |
---|---|
nmisd_max_workers | 50 |
nmisd_scheduler_cycle | 30 |
nmisd_worker_cycle | 10 |
nmisd_worker_max_cycles | 10 |
nmis9d is crashing with no error messages.
Some server info:
- CentOS 7
- 463 Nodes
- Poller server
- High IO Wait
- increased open files to 100’000