Nats cluster config: check /usr/local/omk/conf/opCommon.json on all the vms including Main Pri, Sec Pri, Pollers, Mirrors)
omkadmin@lab-ophamb-mp01:/usr/local/omk/conf$ grep -a4 nats_cluster /usr/local/omk/conf/opCommon.json "db_use_v26_features" : 1, "redis_port" : 6379, "redis_server" : "localhost", "db_port" : "27017", "nats_cluster" : [ “Main Primary , “Sec Primary”, “New Arbiter Poller” ], |
Nats number of replicas setting: check /usr/local/omk/conf/opCommon.json on all the vms including Main Pri, Sec Pri, Pollers, Mirrors)
omkadmin@lab-ophamb-mp01:/usr/local/omk/conf$ grep nats_num_replicas /usr/local/omk/conf/opCommon.json
"nats_num_replicas" : 1, |
Mongo cluster heartbeat check on Main Primary
shankarn@opha-dev4:/usr/local/omk/conf$ mongosh --username opUserRW --password op42flow42 admin --port 27017
rs1 [direct: primary] admin> rs.status()
{
...
members: [
{
_id: 0,
name: 'opha-dev4.opmantek.net:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 17503,
optime: { ts: Timestamp({ t: 1763526818, i: 9 }), t: Long('7') },
optimeDate: ISODate('2025-11-19T04:33:38.000Z'),
lastAppliedWallTime: ISODate('2025-11-19T04:33:38.225Z'),
lastDurableWallTime: ISODate('2025-11-19T04:33:38.190Z'),
},
{
_id: 1,
name: 'opha-dev7.opmantek.net:27017',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 17496,
optime: { ts: Timestamp({ t: 1763526814, i: 1 }), t: Long('7') },
optimeDurable: { ts: Timestamp({ t: 1763526814, i: 1 }), t: Long('7') },
optimeDate: ISODate('2025-11-19T04:33:34.000Z'),
optimeDurableDate: ISODate('2025-11-19T04:33:34.000Z'),
lastAppliedWallTime: ISODate('2025-11-19T04:33:38.225Z'),
lastDurableWallTime: ISODate('2025-11-19T04:33:38.225Z'),
lastHeartbeat: ISODate('2025-11-19T04:33:36.300Z'),
lastHeartbeatRecv: ISODate('2025-11-19T04:33:37.493Z'),
},
{
_id: 2,
name: 'opha-dev6.opmantek.net:27018',
health: 1,
state: 7,
stateStr: 'ARBITER',
uptime: 17496,
lastHeartbeat: ISODate('2025-11-19T04:33:36.301Z'),
lastHeartbeatRecv: ISODate('2025-11-19T04:33:36.290Z'),
}
],
|
If the system is for upgrade from opHA 4.1.2 to opHA 5.1.1, it would be good to do a “Pull” on Primary opHA portal, before upgrading to opHA 5.1.1.
If the Poller has been running for a while, it would be better to move to opHA4 and then do a “Pull” to sync the data. After the sync has happened, its easy to move it back to opHAMB.
Move the desired Poller to opha4 to sync upto the latest (opha5 => opha4).
Peer: Pause the message bus on the Peer
/usr/local/omk/bin/ophad cmd producer pause |
Primary: on the opHA-MB peer portal, “Pull” to sync data from Peer that has been paused.
Move the desired Poller back to opha4 (opha4 => opha5)
This command would set opHA to start using the Message bus.
/usr/local/omk/bin/ophad cmd producer start |
State: It is possible to get the state of the Peers on the Main Primary using the cli
sudo /usr/local/omk/bin/ophad cmd consumer state |
Failover: If Poller were to go down, the Mirror would take over automatically. But, once the Poller comes back online, the switchover from Mirror to Poller is not automatic.
Failback: There is a cli command to accomplish the same which needs to be run the Main Primary (and Primary)
sudo /usr/local/omk/bin/ophad cmd consumer failback <Poller Cluster ID> |
There is also a way to force a Failover which again needs to be run on Main Primary (and Primary)
sudo /usr/local/omk/bin/ophad cmd consumer failover <Poller Cluster ID> |
In the unforeseen event where the main-primary server goes down the second-primary will take over and become the primary server and ensure that the system still runs. Once we recover the main-primary server we can then restart all the services on the main-primary server, to do that run the following command.
Run as root user
systemctl restart nmis9d opchartsd opeventsd omkd ophad |
To switch from the Secondary Primary back to the Main-Primary so the main-primary is the master again follow these steps:
Connect to MongoDB on the master server in this case the (second-primary):
mongosh --username opUserRW --password op42flow42 admin |
Update member priorities:
cfg = rs.conf() cfg.members[0].priority = 0.6 cfg.members[1].priority = 0.5 rs.reconfig(cfg) |
ophad logging : /usr/local/omk/conf/opCommon.json under “opha” add the line
"ophad_logfile" : "/usr/local/omk/log/ophad.log",
"opha" : {
"opha_role" : "Main Primary",
"ophad_logfile" : "/usr/local/omk/log/ophad.log",
"ophad_streaming_apps" : [
"nmis",
"opevents"
], |
nats-server logging : add the lines to /etc/nats-server.conf
log_file: "/var/log/nats-server.log"
shankarn@opha-dev4:~$ cat /etc/nats-server.conf
server_name: "opha-dev4.opmantek.net"
http_port: 8222
listen: 4222
jetstream: enabled
#tls {
# cert_file: "<path>"
# key_file: "<path>"
# #ca_file: "<path>"
# verify: true
#}
log_file: "/var/log/nats-server.log" |
Check sudo journalctl -f -u ophad
shankarn@opha-dev2:/usr/local/omk/log$ sudo journalctl -f -u ophad -- Journal begins at Fri 2024-09-06 16:23:19 AEST. -- Aug 01 10:15:59 opha-dev2 ophad[46242]: ophad v0.0.0: agent Aug 01 10:16:01 opha-dev2 ophad[46242]: cannot init logger: cannot create logfile open /usr/local/omk/log/ophad.log: permission denied Aug 01 10:16:01 opha-dev2 systemd[1]: ophad.service: Main process exited, code=exited, status=1/FAILURE Aug 01 10:16:01 opha-dev2 systemd[1]: ophad.service: Failed with result 'exit-code'. |
edit /etc/systemd/system/ophad.service to remove the below lines
Type=simple User=root Group=root |
cat /etc/systemd/system/ophad.service.bkup [Unit] Description=opHA daemon After=network-online.target Wants=network-online.target [Service] #on failure try to restart every RestartSec, upto StartLimitBurst times within StartLimitInterval Restart=on-failure RestartSec=10 StartLimitInterval=300 StartLimitBurst=10 WorkingDirectory=/usr/local/omk ExecStart=/usr/local/omk/bin/ophad agent --streaming-type=nats [Install] |
reload and restart ophad
sudo systemctl daemon-reload sudo systemctl restart ophad |
Run the command sudo /usr/local/omk/bin/ophad verify on all the Peers/Primary.
The last line “ophad.verify: ready for liftoff 🚀 “ to indicate the configuration is good.
shankarn@opha-dev5:~$ sudo /usr/local/omk/bin/ophad verify
[sudo] password for shankarn:
ophad v0.0.52: agent
Appending to file "/usr/local/omk/log/ophad.log"
Settings -----------------------------------------
* ClusterId: 783d7b91-6c64-4db9-a28f-6364a54b8505
* OMKDatabase:
* ConnectionTimeout: 5h33m20s
* RetryTimeout: 3m0s
* PingTimeout: 33m20s
* QueryTimeout: 1h23m20s
* Port: 27017
* Server: localhost
* MongoCluster: []
* ReplicaSet: (blank)
* Name: omk_shared
* Username: opUserRW
* Password: ******
* WriteConcern: 1
* Uri: (blank)
* BatchSize: 0
* BatchTimeout: 0
* NMISDatabase:
* ConnectionTimeout: 2m0s
* RetryTimeout: 3m0s
* PingTimeout: 20s
* QueryTimeout: 1h23m20s
* Port: 27017
* Server: localhost
* MongoCluster: []
* ReplicaSet: (blank)
* Name: nmisng
* Username: opUserRW
* Password: ******
* WriteConcern: 1
* Uri: (blank)
* BatchSize: 50
* BatchTimeout: 500
* OpEventsDatabase:
* ConnectionTimeout: 2m0s
* RetryTimeout: 3m0s
* PingTimeout: 20s
* QueryTimeout: 5m0s
* Port: 27017
* Server: localhost
* MongoCluster: []
* ReplicaSet: (blank)
* Name: opevents
* Username: opUserRW
* Password: ******
* WriteConcern: 1
* Uri: (blank)
* BatchSize: 50
* BatchTimeout: 500
* OMK:
* LogLevel: info
* BindAddr: *
* Directories:
* Base: /usr/local/omk
* Conf: /usr/local/omk/conf
* Logs: /usr/local/omk/log
* Var: /usr/local/omk/var
* OPHA:
* DBName: opha
* StreamingApps: [nmis opevents]
* Logfile: /usr/local/omk/log/ophad.log
* MongoWatchFilters: []
* StreamType: nats
* AgentPort: 6000
* NonActiveTimeout: 8m0s
* ResumeTokenCollection: resume_token
* OpHACliPath: /usr/local/omk/bin/opha-cli.pl
* Compression: true
* Role: Poller
* Consumer: false
* Producer: false
* ConsumerPollerSet: (blank)
* DebugEnabled: false
* Redis:
* RedisServer: localhost
* RedisPort: 6379
* RedisPassword: ******
* RetryTimeout: 3m0s
* RedisStreamLenCheckPeriod: 5
* RedisProducerMaxStreamLength: 10000
* MaxRetries: 180
* RedisTLSEnabled: false
* RedisTLSSkipVerify: false
* RedisProducerDegradeTimeout: 10
* RedisProducerFullDegradeTimeout: 10
* Kafka:
* Seeds: localhost:63616,localhost:63627,localhost:63629
* RetryTimeout: 3m0s
* MaxRetries: 180
* Nats:
* NatsServer: opha-dev4.opmantek.net
* NatsCluster: []
* NatsPort: 4222
* NatsNumReplicas: 1
* NatsUsername: omkadmin
* NatsPassword: ******
* RetryTimeout: 3m0s
* NatsStreamLenCheckPeriod: 5
* NatsProducerMaxMsgPerSubject: 1000000
* NatsMaxAge: 604800
* MaxRetries: 180
* NatsTLSEnabled: false
* NatsTLSCert: <path>
* NatsTLSKey: <path>
* NatsTLSSkipVerify: false
* NatsProducerDegradeTimeout: 10
* NatsProducerFullDegradeTimeout: 10
* Authentication:
* AuthTokenKeys: ******
--------------------------------------------------
2025-10-22T08:01:46.329+1100 [INFO] ophad.verify: verify nmis9 mongodb connection with database: name=nmisng
2025-10-22T08:01:46.451+1100 [INFO] ophad.verify: MongoDB NMIS connect: maybe="found nodes collection in nmis9 ✅"
2025-10-22T08:01:46.451+1100 [INFO] ophad.verify: verify omk mongodb connection with database: name=opha
2025-10-22T08:01:46.551+1100 [INFO] ophad.verify: MongoDB OMK connect: maybe="found opstatus collection in omk database ✅"
2025-10-22T08:01:46.575+1100 [INFO] ophad.verify: Nats connect:
result=
| can connect to nats-server: opha-dev4.opmantek.net version: 2.11.9 ✅
| we can connect to Nats-server ✅
2025-10-22T08:01:46.575+1100 [INFO] ophad.verify: ready for liftoff 🚀 |