Process Hang Issue

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Process Hang Issue

  • Creator
    Topic
  • #47848
    David Adams
    Participant

      We recently added some new interfaces to our production environment.

      We have since had an issue where some of our processes hang.  The interface stops processing and the logs just cut off.  The interface is not responsive and when we try to stop the process it says it is hung then issues a sigint to stop it.  On the first attempt to start the process it crashes with a panic.  It will usually come up on the 2nd attempt.  

      We have been working with Mckesson and have done the following to try to resolve:

      1) Running total cleanup on the database

    Viewing 15 reply threads
    • Author
      Replies
      • #56878
        Charlie Bursell
        Participant

          My best guess is that one or more of the new interfaces is doing a blocking read.  Can  you pin it down to a specific interface?

        • #56879
          David Adams
          Participant

            I do not know which one is doing it, can you tell me what a blocking read is and where I check for it?

          • #56880
            Charlie Bursell
            Participant

              A blocking read is just that.  A driver that blocks waiting for data.  If the data does not come – gridlock.

              I’m not sure its the driver, it could be soomething else.  Is it possible you have a Tcl proc that gets in an infinate loop?

              Did you turn up EO?  What was the last thing the engine was doing when it blocked?

              All sorts of possibilities.  Is it always 1  process that hangs?  Try moving some of the threads to other processes and see if you can isolate it to a thread.  You could also place some Debug statements so you can see whta was happening when it blocked

              Hard to debug from here.

            • #56881
              David Adams
              Participant

                It is happening in multiple processes.  The log files just stop.  It is as if they just cut off.  

                I will take another look at the TCL’s but I believe that they are all the same as the other interfaces (create_ack, send_ob_message and validate_ack).  Also they all use the same protocol for communication.

                Thank you for your input so far, let me know if you think of anything else.

              • #56882
                Anonymous
                Participant

                  Is the monitord still responding? can you check the recovery database to see if there is a message hung in status 5 on something like that?

                • #56883
                  David Adams
                  Participant

                    Yes my other interfaces appear to be working properly and I am able to see messages in the recovery database.  Usually I see one in a state 14 and the rest in a state 11 for the process that is having the issue.

                  • #56884
                    Anonymous
                    Participant

                      Could it be that you are not getting the reply from the interfacing application (or that the reply is not in the correct format)

                      Check if you have the right variant to parse the ACK that you receive (outbound tab).

                    • #56885
                      David Adams
                      Participant

                        I am not sure what to check.  Is this related to the validate_hl7ack that I am using?  Thank you..

                      • #56886
                        Anonymous
                        Participant

                          Yes, my guess is that if you send the message and they don’t send you the corresponding ACK (or if you cannot process their ACK correctly) then you will see a message in status 14 and the rest in status 11 and nothing seems to be moving.

                        • #56887
                          Jim Kosloskey
                          Participant

                            David,

                            Do you have a time out specified in case a reply is not received, or are you configured to wait forever?

                            Jim Kosloskey

                            email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                          • #56888
                            David Adams
                            Participant

                              It is set at -1 so I believe to wait forever.

                              That does seem to be the case.  

                              It does not seem that it should hang all the threads and process.  Is this what you have seen happen?

                            • #56889
                              Anonymous
                              Participant

                                David,

                                You are right, if we don’t receive the ACK, that only stops the sending thread, not the whole process. Things that stop the whole process normally are bad messages, TCL errors, abuse of “sleep” in the tclprocs, infinite loops, etc. If everything was working before I would check first for bad messages or messages with strange characters.

                              • #56890
                                Daniel Lee
                                Participant

                                  We had this same problem on 3.8.1P on AIX 5.1 and traced it back to something in the cycle smatt that we got off of Clovertech.  We never did figure out what it was that caused the problem but we re-wrote the cycle smatt in perl and haven’t had the problem since.  Does it only happen when your cycle smatt job kicks off?

                                • #56891
                                  David Adams
                                  Participant

                                    This seems to have fixed our issue !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                                    AIXTHREAD_MNRATIO=1:1 in /etc/environment file

                                    This setting seemed to fix our issue.  We are still monitoring but it has been up longer without issues than any time in the past month.  We have done more research on this and it is a setting that is part of AIX that goes in the /etc/environment files.  It has to do with the relationship between user threads and kernel threads.  The default for AIX has something like 8:1 user:kernel threads.  This setting makes that 1:1.

                                  • #56892
                                    Kevin Scantlan
                                    Participant

                                      I got this documentation out of a IBM website:

                                      AIXTHREAD_MNRATIO (AIX 4.3 and later) Purpose: Controls the scaling factor of the library. This ratio is used when creating and terminating pthreads.

                                      Values: Default: 8:1

                                      Range: Two positive values (p:k), where k is the number of kernel threads that should be employed to handle p runnable pthreads

                                      Display: echo $AIXTHREAD_MNRATIO (this is turned on internally, so the initial default value will not be seen with the echo command)

                                      Change: AIXTHREAD_MNRATIO=p:kexport AIXTHREAD_MNRATIOChange takes effect immediately in this shell. Change is effective until logging out of this shell. Permanent change is made by adding AIXTHREAD_MNRATIO=p:k command to the /etc/environment file.

                                      Diagnosis: N/A

                                      Tuning: May be useful for applications with a very large number of threads. However, always test a ratio of 1:1 because it may provide for better performance.

                                      Does Quovadx recommend setting the ratio to be 1:1 even if we don’t show any problems?

                                    • #56893
                                      David Adams
                                      Participant

                                        I do not know.  I have been working through McKesson and actually found this setting by searching these old Clovertech archives.

                                        I do not have a direct line to Quovadx.

                                        In my opinion I would not change something if you do not have problem.

                                    Viewing 15 reply threads
                                    • The forum ‘Cloverleaf’ is closed to new topics and replies.