State: It is possible to get the state of the Peers on the Main Primary using the cli
sudo /usr/local/omk/bin/ophad cmd consumer state |
Failover: If Poller were to go down, the Mirror would take over automatically. But, once the Poller comes back online, the switchover from Mirror to Poller is not automatic.
Failback: There is a cli command to accomplish the same which needs to be run the Main Primary (and Primary)
sudo /usr/local/omk/bin/ophad cmd consumer failback <Poller Cluster ID> |
There is also a way to force a Failover which again needs to be run on Main Primary (and Primary)
sudo /usr/local/omk/bin/ophad cmd consumer failover <Poller Cluster ID> |
In the unforeseen event where the main-primary server goes down the second-primary will take over and become the primary server and ensure that the system still runs. Once we recover the main-primary server we can then restart all the services on the main-primary server, to do that run the following command.
Run as root user
systemctl restart nmis9d opchartsd opeventsd omkd ophad |
To switch from the Secondary Primary back to the Main-Primary so the main-primary is the master again follow these steps:
Connect to MongoDB on the master server in this case the (second-primary):
mongosh --username opUserRW --password op42flow42 admin |
Update member priorities:
cfg = rs.conf() cfg.members[0].priority = 0.6 cfg.members[1].priority = 0.5 rs.reconfig(cfg) |
Check sudo journalctl -f -u ophad
shankarn@opha-dev2:/usr/local/omk/log$ sudo journalctl -f -u ophad -- Journal begins at Fri 2024-09-06 16:23:19 AEST. -- Aug 01 10:15:59 opha-dev2 ophad[46242]: ophad v0.0.0: agent Aug 01 10:16:01 opha-dev2 ophad[46242]: cannot init logger: cannot create logfile open /usr/local/omk/log/ophad.log: permission denied Aug 01 10:16:01 opha-dev2 systemd[1]: ophad.service: Main process exited, code=exited, status=1/FAILURE Aug 01 10:16:01 opha-dev2 systemd[1]: ophad.service: Failed with result 'exit-code'. |
edit /etc/systemd/system/ophad.service to remove the below lines
Type=simple User=root Group=root |
cat /etc/systemd/system/ophad.service.bkup [Unit] Description=opHA daemon After=network-online.target Wants=network-online.target [Service] Type=simple User=root Group=root #on failure try to restart every RestartSec, upto StartLimitBurst times within StartLimitInterval Restart=on-failure RestartSec=10 StartLimitInterval=300 StartLimitBurst=10 WorkingDirectory=/usr/local/omk ExecStart=/usr/local/omk/bin/ophad agent --streaming-type=nats [Install] |
reload and restart ophad
sudo systemctl daemon-reload sudo systemctl restart ophad |