Receiving HCIMONITORD messages

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Receiving HCIMONITORD messages

  • Creator
    Topic
  • #51193
    mike brown
    Participant

      Anyone ever seen these alerts before :

      This is happening in our TEST and PROD environment,  they are on separate servers.

      We are currently experiencing “hcimonitord” problems with the monitor hanging and not displaying the status,  the processes and threads are not refreshing in a timely manner, up to 2 – 3 minutes to refresh.

      I found in the hcimonitord logs :

      ****

      [icl :tcpi:ERR /0:  hcimonitord:09/20/2009 02:39:13] write failed: Broken pipe

      [cmd :cmd :INFO/0:  hcimonitord:09/20/2009 02:39:13] Inrecoverable socket error.  Closing connection.

      [aler:aler:INFO/0:  hcimonitord:09/20/2009 02:39:13] Removing alerts and wants for connection 0x20d29398

      ****

      I have bounced the server, cleaned up the DB and ran our bounce/cleanup scripts, they run once a month and the issue is still occurring, any help is greatly appreciated.

      mike

    Viewing 12 reply threads
    • Author
      Replies
      • #69138
        James Cobane
        Participant

          Did you simply cycle the monitor daemon (is that part or your clean-up scripts)?  I can’t say that I’ve seen that specific error, but generally if your having an issue with the monitor refreshing, cycling the monitor daemon resolves it (hcisitectl -k m; hcisitectl -s m)

          Jim Cobane

          Henry Ford Health

        • #69139
          Tom Rioux
          Participant

            Mike,

            We had something similar happen to us last week.   I did the same things you described to no avail.   Finally, I ran a ps -ef and checked for any processes that may be listed twice.    Sure enough, we had two processes that were out there twice.  How or why that happened is beyond me.  I killed off the duplicate processes, brought everything back up and all seemed to function normal again.  

            You may want to take a look at that to see if that may be your issue.

            Hope this helps….Tom

          • #69140
            Tom Rioux
            Participant

              Hey Jim,

              In our case, it wouldn’t let us run the normal clean up scripts.   It was simply hanging up.   This is what I did here:

              1.  Run the hcisitectl command with the -f command.

              2.  Go to each process directory and remove the “pid”, if present

              3.  Go to each daemon directory and remove the “pid”, if present

              4.  Run and “ps -ef” and grep for the site name to see if any processes are still listed.

              In our case, we still had two processes listed, even though they didn’t appear to be running and the pid was removed from the process directory.  After killing off the two processes and bringing everything back up, all seemed to process okay.

              Thanks….Tom

            • #69141
              Jim Kosloskey
              Participant

                Mike and Tom,

                I am just curious – do you have any TCP/IP ports in the ephemereal range assigned to any integration threads?

                Also what release are you both on?

                Thanks.

                email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

              • #69142
                Tom Rioux
                Participant

                  Jim,

                  We do have a  handful of our outbound interfaces that are within the ephemeral ranges.  They are port numbers that were assigned for us by the Portal system and are spread out across our various sites.   None of these ports were in either of the duplicated processes I spoke of above.

                  We are 5.6.2.

                  Thanks…Tom

                • #69143
                  Troy Morton
                  Participant

                    Can you explain a little more what the -f option does on hcisitectl?

                    I have always thought you should never remove any pid files or start/stop lock manager while engine processes may be running.

                  • #69144
                    Deborah Ingram
                    Participant

                      CL 5.5: We are having similar issues where the GUI takes a very long time to update changes/pstarts/pstops/etc.  We followed the directions above, but we do not have any duplicate processes.  In addition, we tried doing the sitecleanup last night, and reinitialized all the databases with hcidbinit -AC but everything is still very slow from the GUI.  

                      Each time the monitord is started we get the following in our .err file:

                      Quote:

                      [icl :tcpi:ERR /0:  hcimonitord:09/25/2009 02:50:30] write failed: Broken pipe

                      The following seems to be reoccurring throughout the .err file:

                      Quote:

                      [aler:aler:WARN/0:  hcimonitord:09/25/2009 02:50:30] Creating AlertAction: cascade

                      [cmd :cmd :WARN/0:  hcimonitord:09/25/2009 02:50:30] alerts client 0x20c53f38

                      [aler:aler:WARN/0:  hcimonitord:09/25/2009 02:50:30] Creating AlertAction: eocnotify

                      [cmd :cmd :WARN/0:  hcimonitord:09/25/2009 02:50:30] eonotify client 0x20c53f38

                      ..And then sometimes we get this in the .err file:

                      Quote:

                      [icl :tcpi:ERR /0:  hcimonitord:09/24/2009 13:35:39] write failed: Broken pipe

                      [cmd :cmd :WARN/0:  hcimonitord:09/24/2009 13:35:39] Invalid connection, tcpip =  0x0

                      Any ideas?

                      thx

                    • #69145
                      Tom Rioux
                      Participant

                        Troy,

                        The -f option merely forces the site daemons to stop, even though there are processes running.   Also, I agree, normally, you don’t want to remove the pid’s while a process is running.  In our case, both were a necessary evil.  

                        Tom

                      • #69146
                        mike brown
                        Participant

                          Hi Thanks for the responses…

                          I have done all the suggestions mentioned in the responses,

                          i have setup a cronjob to bounce the monitor for each site(9) every 4 hours.

                          Still the error occurs, I am in a AIX 5.3 T8 environment, running the client on a windows PC, jvm_args is set to “Xmx-512m”, debug is false, netmon debug is false, I noticed the JAVA and Network traffic has increased 1/3 which is a significant increase. This a nightmare to research with no resolution, I have engaged Healthvision, they are stumped as well.

                          no duplicate processes in the “ps -ef | grep hcimonitord”.

                        • #69147
                          Tom Rioux
                          Participant

                            Do a “ps -ef | grep hciengine” and see if you see duplicate processes.

                          • #69148
                            mike brown
                            Participant

                              Hi Thomas

                              I did a ps -ef | grep hciengine

                              It returned a list and duplicate processes yes, but in different sites, our processes in our sites number up to 14 for most not all.

                            • #69149
                              Russ Ross
                              Participant

                                Also do

                                ps -ef | grep CloverleafHostServer

                                to see if you have more than one instance of the hostserver running which will also cause extreme slowness of the IDE and sometimes make it completely hang.

                                We think interfaces that are using port numbers in the ephermail range above 32K is what is causing multiple instances of the hostserver to launch unexpectedly at our facility.

                                One day I hope to remediate the interfaces using port numbers above 32K.

                                This really showed up with a vengence when we hosted cloverleaf level III training at our facility and I gave everyone port numbers in the ephermal range.

                                Previously we had never seen the problem even once which helps with zeroing in on the potential underlying cause.

                                Russ Ross
                                RussRoss318@gmail.com

                              • #69150
                                mike brown
                                Participant

                                  one instance of

                                  ps -ef | grep CloverleafHostServer

                              Viewing 12 reply threads
                              • The forum ‘Cloverleaf’ is closed to new topics and replies.