engine hung with no errors or other warnings

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf engine hung with no errors or other warnings

  • Creator
    Topic
  • #52358
    Tim Gobbel
    Participant

      We are on 5.6 rev 2 (??) on AIX.  Tonight the engine just stopped processing messages with no errors or other signs and symptoms.  I blamed the vendor until they called back and said it wasn’t them (Right!)  but then I did find that the engine had stopped receiving and thus sending message to them.  I checked the logs but saw nothing.  I re-booted CL and the messages started to flow.  Anyone experience this before?  any ideas as to why?  Thanx!

    Viewing 5 reply threads
    • Author
      Replies
      • #73899
        Kevin Scantlan
        Participant

          Do you have automatic process log cycling turned on?  We had problems with that and backed most of it out.  We are also on CL 5.6 rev2 with AIX 5.3  .  What was the last thing that showed in the process log in question?

        • #73900
          Mark McDaid
          Participant

            Hi Tim,

            I have one particular interface with our radiology vendor that has been problematic over the past 3 years.  The problem is that once every 2-3 weeks, the process hangs with no alert and no indication of any problem.  This occurred when we were on version 5.2, and it still occurs now that we are on version 5.6 rev 2.  There has to be something coming across that interface from radiology, but I’ve never been able to figure out the root cause of the process hanging.  So, my solution (work-around) was to put the radiology interfaces in their own site within Cloverleaf.  Within this site there are 2 processes, ARA and monitorARA.  The ARA process contains the actual interface threads, the monitorARA process is used for the purpose of detecting when the ARA process becomes hung.  There are 2 threads within the monitorARA process, a timer thread that creates a simple “hello” message every 30 seconds, and another thread that sends the message to a third thread (fr_monitorARA) via a TCP/IP hop, that is in the actual ARA process.  I have an alert setup to go off when this fr_monitorARA thread has not received a message in more than 45 seconds (since it should be receiving a message every 30 seconds from the monitorARA process).  I then wrote a script that handles all of the shutdown, clean-up, and restart of the ARA process.  This script is set to fire when the alert fires.  This setup has worked well for the past 2 years.  It doesn’t solve the cause of the problem, but it keeps the interfaces going without human intervention.

          • #73901
            John Mercogliano
            Participant

              Tim,

                Did you check your monitord log or as part of your cycling do you do a

              Code:

              hcicmd -p hcimonitord -t d -c “cycle”

                When we first set up our automated monitoring solution we ran into the same problem because the queries to the hcimonitord was making the monitord log file grow too large.  

                Something to look forward to in 5.7, we have automated cycling of log files now.

              Hope this helps,

              John Mercogliano
              Sentara Healthcare
              Hampton Roads, VA

            • #73902
              Tim Gobbel
              Participant

                Thanx for all your replies.  I will look into all of them in relation to what I have and try to get back to you all.  Thanx!

              • #73903
                Donna Bailey
                Participant

                  I had the same kind of issue awhile back….Cloverleaf support was pointing me toward a possible problem with a tclproc.  We had two procs that were doing some complex work…..we put the threads into their own process and did alerts etc to handle the hang ups.  We were able to remove those procs after a short period and didn’t have the hang ups any longer.  Just a thought….

                  Donna Bailey
                  Tele: 315-729-3805
                  dbailey@microstar.health
                  Micro Star Inc.

                • #73904
                  Tim Gobbel
                  Participant

                    I will keep my eye open to that.  We don’t use a lot of tclprocs and I am not aware of any that we changed prior to the hang.  I still do not know which process it was but I suspect the ADT.  There was one Xlate that I may have changed so I will go back and review that.  Thanx!

                Viewing 5 reply threads
                • The forum ‘Cloverleaf’ is closed to new topics and replies.