Running v6.1 and this is the first time we’ve encountered an issue where a site Monitor Daemon crashes on multiple sites. The GUI would show all red connections & processes but the lock manager would be green.
We could restart the Monitor Daemon and it would restore itself. Digging in deeper on this, we would shut the site down (hcisitectl -K) then bring it back up (hcisitectl -S) to find some processes with abnormal exits while other processes were unaffected.
For preventative measures, we siteinit’d all our sites to avoid any anomalies and are continuing to monitor performance, though we’re simply without a root cause and were ambushed on these failures because any alerts would have been tied to the Monitor Daemon (that wasn’t running).
My guess is that the lock on the monitor process was corrupted (?) which held a session and eventually crashed it?
As an aside to this question, what is the /Lock directory intended for? I understand there is a known issue with session locks not properly exiting on this version. I’m curious if the contents of /Lock store these sessions and if so, what is the harm in routinely wiping files older than 10 days?