Receiving HCIMONITORD messages

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Receiving HCIMONITORD messages

  • Creator
    Topic
  • #51193
    mike brown
    Participant

    Anyone ever seen these alerts before :

    This is happening in our TEST and PROD environment,  they are on separate servers.

    We are currently experiencing “hcimonitord” problems with the monitor hanging and not displaying the status,  the processes and threads are not refreshing in a timely manner, up to 2 – 3 minutes to refresh.

    I found in the hcimonitord logs :

    ****

    [icl :tcpi:ERR /0:  hcimonitord:09/20/2009 02:39:13] write failed: Broken pipe

    [cmd :cmd :INFO/0:  hcimonitord:09/20/2009 02:39:13] Inrecoverable socket error.  Closing connection.

    [aler:aler:INFO/0:  hcimonitord:09/20/2009 02:39:13] Removing alerts and wants for connection 0x20d29398

    ****

    I have bounced the server, cleaned up the DB and ran our bounce/cleanup scripts, they run once a month and the issue is still occurring, any help is greatly appreciated.

    mike

Viewing 12 reply threads
  • Author
    Replies
    • #69138
      James Cobane
      Participant

      Did you simply cycle the monitor daemon (is that part or your clean-up scripts)?  I can’t say that I’ve seen that specific error, but generally if your having an issue with the monitor refreshing, cycling the monitor daemon resolves it (hcisitectl -k m; hcisitectl -s m)

      Jim Cobane

      Henry Ford Health

    • #69139
      Tom Rioux
      Participant

      Mike,

      We had something similar happen to us last week.   I did the same things you described to no avail.   Finally, I ran a ps -ef and checked for any processes that may be listed twice.    Sure enough, we had two processes that were out there twice.  How or why that happened is beyond me.  I killed off the duplicate processes, brought everything back up and all seemed to function normal again.  

      You may want to take a look at that to see if that may be your issue.

      Hope this helps….Tom

    • #69140
      Tom Rioux
      Participant

      Hey Jim,

      In our case, it wouldn’t let us run the normal clean up scripts.   It was simply hanging up.   This is what I did here:

      1.  Run the hcisitectl command with the -f command.

      2.  Go to each process directory and remove the “pid”, if present

      3.  Go to each daemon directory and remove the “pid”, if present

      4.  Run and “ps -ef” and grep for the site name to see if any processes are still listed.

      In our case, we still had two processes listed, even though they didn’t appear to be running and the pid was removed from the process directory.  After killing off the two processes and bringing everything back up, all seemed to process okay.

      Thanks….Tom

    • #69141
      Jim Kosloskey
      Participant

      Mike and Tom,

      I am just curious – do you have any TCP/IP ports in the ephemereal range assigned to any integration threads?

      Also what release are you both on?

      Thanks.

      email: jim.kosloskey@jim-kosloskey.com

    • #69142
      Tom Rioux
      Participant

      Jim,

      We do have a  handful of our outbound interfaces that are within the ephemeral ranges.  They are port numbers that were assigned for us by the Portal system and are spread out across our various sites.   None of these ports were in either of the duplicated processes I spoke of above.

      We are 5.6.2.

      Thanks…Tom

    • #69143
      Troy Morton
      Participant

      Can you explain a little more what the -f option does on hcisitectl?

      I have always thought you should never remove any pid files or start/stop lock manager while engine processes may be running.

    • #69144
      Deborah Ingram
      Participant

      CL 5.5: We are having similar issues where the GUI takes a very long time to update changes/pstarts/pstops/etc.  We followed the directions above, but we do not have any duplicate processes.  In addition, we tried doing the sitecleanup last night, and reinitialized all the databases with hcidbinit -AC but everything is still very slow from the GUI.  

      Each time the monitord is started we get the following in our .err file:

      Quote:

      [icl :tcpi:ERR /0:  hcimonitord:09/25/2009 02:50:30] write failed: Broken pipe

      The following seems to be reoccurring throughout the .err file:

      Quote:

      [aler:aler:WARN/0:  hcimonitord:09/25/2009 02:50:30] Creating AlertAction: cascade

      [cmd :cmd :WARN/0:  hcimonitord:09/25/2009 02:50:30] alerts client 0x20c53f38

      [aler:aler:WARN/0:  hcimonitord:09/25/2009 02:50:30] Creating AlertAction: eocnotify

      [cmd :cmd :WARN/0:  hcimonitord:09/25/2009 02:50:30] eonotify client 0x20c53f38

      ..And then sometimes we get this in the .err file:

      Quote:

      [icl :tcpi:ERR /0:  hcimonitord:09/24/2009 13:35:39] write failed: Broken pipe

      [cmd :cmd :WARN/0:  hcimonitord:09/24/2009 13:35:39] Invalid connection, tcpip =  0x0

      Any ideas?

      thx

    • #69145
      Tom Rioux
      Participant

      Troy,

      The -f option merely forces the site daemons to stop, even though there are processes running.   Also, I agree, normally, you don’t want to remove the pid’s while a process is running.  In our case, both were a necessary evil.  

      Tom

    • #69146
      mike brown
      Participant

      Hi Thanks for the responses…

      I have done all the suggestions mentioned in the responses,

      i have setup a cronjob to bounce the monitor for each site(9) every 4 hours.

      Still the error occurs, I am in a AIX 5.3 T8 environment, running the client on a windows PC, jvm_args is set to “Xmx-512m”, debug is false, netmon debug is false, I noticed the JAVA and Network traffic has increased 1/3 which is a significant increase. This a nightmare to research with no resolution, I have engaged Healthvision, they are stumped as well.

      no duplicate processes in the “ps -ef | grep hcimonitord”.

    • #69147
      Tom Rioux
      Participant

      Do a “ps -ef | grep hciengine” and see if you see duplicate processes.

    • #69148
      mike brown
      Participant

      Hi Thomas

      I did a ps -ef | grep hciengine

      It returned a list and duplicate processes yes, but in different sites, our processes in our sites number up to 14 for most not all.

    • #69149
      Russ Ross
      Participant

      Also do

      ps -ef | grep CloverleafHostServer

      to see if you have more than one instance of the hostserver running which will also cause extreme slowness of the IDE and sometimes make it completely hang.

      We think interfaces that are using port numbers in the ephermail range above 32K is what is causing multiple instances of the hostserver to launch unexpectedly at our facility.

      One day I hope to remediate the interfaces using port numbers above 32K.

      This really showed up with a vengence when we hosted cloverleaf level III training at our facility and I gave everyone port numbers in the ephermal range.

      Previously we had never seen the problem even once which helps with zeroing in on the potential underlying cause.

      Russ Ross
      RussRoss318@gmail.com

    • #69150
      mike brown
      Participant

      one instance of

      ps -ef | grep CloverleafHostServer

Viewing 12 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,053
Forums
28
Topics
9,235
Replies
34,153
Topic Tags
273