Moved to 5.7, monitor now hangs

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Moved to 5.7, monitor now hangs

  • Creator
    Topic
  • #52012
    Baron Matthews
    Participant

      We were on 5.4 and recently moved to 5.7.  Now, every couple of days, the monitor daemon will hang.  It seems to be much deeper than just the monitor hanging though as all the threads start to back up and not process as well.  We have had 5.4 running for several years without this issue, but seeing that we moved to new servers and a new version, there are multiple points where everything changed.  

      Is anyone else out there experiencing this hang on 5.7?  We are slowly tracing through each thread too, trying to make sure there is nothing odd there that is hanging everything (even though they didn’t change, maybe something doesn’t have the right permission somewhere, etc).

    Viewing 15 reply threads
    • Author
      Replies
      • #72693
        Bob Richardson
        Participant

          Greetings,

          Have you checked that CIS5.7 is certified to run on your current OS version?   This information would be in the Release Notes for 5.7.

          Check if you need to upward adjust the “System Parameter Settings” for users hci and hcitest (if these accounts are used to run the Cloverleaf processes).  For AIX Unix these are the values displayed by >ulimit -a OR viewable in file /etc/security/limits.

          Otherwise a call to Healthvision/Lawson support sounds like the next step here.

          Hope this helps you.

        • #72694
          Baron Matthews
          Participant

            Thanks for the reply.  The OS is good, all of our other environments run fine, it is just one that is giving us issues (never hiccuped in 5.4 though).  I think we may have traced it to a thread doing something it just doesn’t like.  We are going to try a reboot during a quite time this weekend though before we set up an environment for just that one thread.

          • #72695
            Bob Richardson
            Participant

              Greetings again:

              Do you have TCL configured in the thread that launches scripts/commands in the current shell?   That is, are not launched (forked, spawned, run) in the background?   If the script/command hangs or does not return to the calling program the process hangs.  This may create a cascade action where the “process token” is not passed around in the site and thus you get the allusion of a monitor hang.

              We had a similar problem when migrating from 5.3R3 to 5.6R2 a couple of years ago (If my memory is not failing here….).

              Worth a check – just add that & to the launched script/command.

              Mail programs were our achilles heal.

              Good luck!   Good hunting!

            • #72696
              Laura June
              Participant

                Have you checked the easy answer – are the qdx5.7 directories excluded from your backup and virus scanning?

              • #72697
                Donna Hooe
                Participant

                  Check the version of the software you are running on the pc’s versus the server.   We had this issue a while back and there was one pc in our helpdesk that had not been updated to 5.7Rev2.  This was causing it to hang.  We check all pc’s we use to run the GUI’s on and once we got them all on the same version and Rev patch we were good to go.

                • #72698
                  Baron Matthews
                  Participant

                    Thanks Laura, but I should have said we are running AIX, so that doesn’t apply.

                    Donna, that is a good idea as we do have some operators that might still have a link to the old version.  I am going to head over that way, check their pc’s and, if they have the old link, try it out and see if it causes the others to hang.

                    As of right now, we have moved the thread we suspect as a possible cause to a different environment and rebooted the server.  We did not have any hangs over the weekend.

                  • #72699
                    Baron Matthews
                    Participant

                      We have tried several things, still hanging up around once a day.  Sometimes, several times in a day.  We are going to split out the environments next and try to pinpoint the issue.

                      In the meantime, I wrote a script that checks for a hung monitor, kills it, and restarts it.

                    • #72700
                      Bob Richardson
                      Participant

                        Greetings,

                        We are upgrading to 5.7 as well and recently discovered through working with Healthvision/Lawson support that we needed to remove our

                        jvm_args in our client.ini files to speed up the painting of the NetMonitor windows.  The new server.ini now has a jvm_args line and we were colliding with it with our clients.

                        Maybe worth a shot for now on your client side.

                        Again, do not hesitate to contact Healthvision/Lawson support!

                      • #72701
                        Traci Zee
                        Participant

                          We also have issues with the NetMonitor not refreshing properlyin 5.7.  Our Server.ini has jvm_args=-Xmx256m.

                          We are on Rev 2 running on AIX.  It seems to become worse when the Windows client is low on memory.  I have to close 3-5 applications and then reopen the GUI and NetMonitor seems to behave better – for a while.

                          Let me know if this helps or you find another explanation/solution.

                          Traci Zee

                          Sr. Integration Engineer

                          Emdeon

                          tzee@emdeon.com

                          615.932.3960

                        • #72702
                          Bob Richardson
                          Participant

                            Greetings,

                            Are you running with the latest patches for CIS5.7, that is, Revision 2?

                            The only other idea here is that the Windows clients may need more RAM and CPU resources perhaps.  Right now we are running the 5.7 clients on

                            XP SP3 with 3.25 GB of RAM with Intel DuoCPU at 3 GHz.  But then we are not in production only test right now.

                            Please post your platform specifics and if you are using the Revision 2 patches.

                            Thanks.

                          • #72703
                            Traci Zee
                            Participant

                              Yes, I’m running 5.7, Rev 2 on my Dev/Cert Cloverleaf.

                              My PC is running Windows XP, SP3

                              Dual CPU @ 2 GHz

                              3 GB of RAM

                              Of course, I have several other applications running on my client at the same time – Outlook, Source Control, several Unix sessions, a Cisco VOIP phone and IM.  Closing some apps does seem to help sometimes.

                              Traci

                            • #72704
                              Baron Matthews
                              Participant

                                Let me further clarify a bit, when I say that the “monitor hangs,” I don’t just mean that the client is freezing up, I mean that all messages stop flowing.  If it was just the client freezing, that would be managable.  When this is happening, no data is flowing for any application connected in that environment.  

                                The only way to fix it is to go out to Unix, find the PID of the particular monitor, force a kill on it, and then restart the monitor.  We are not losing data as the various systems connecting to us (which, to them, it never appears that the connection is actually down, just that they can’t send data) queue their data, but before I wrote my script to watch things, it could go on for an hour before everyone starts noticing that orders have not been crossing over.  

                                I can’t use an alert because they stop working as well when the monitor hangs.  There are no errors.  There are no panics.  It simply stops working.  If you have the client up on your screen, you can’t even tell because everything still shows up and running.  You have to check the status of a thread or do some other thread related task before you know.  If you try to do something to the thread, it will just sit there and wait and wait and do nothing.  My script checks it every 15 minutes now, so that we can be on top of things.

                                The computers here that run the client don’t even have to be on and it can still freeze.  I am fairly sure that it has nothing to do with the client or our personal computers.  It is something to do with that particular environment and how the processing is happening in 5.7 vs 5.4.  We ran for 4 or more years on 5.4 with this same environment and it never froze like this.  The other 6 environments we have are not freezing.  It is very odd!  If we were at least getting some type of error, that would be different.  At least then we would have a starting point.

                                Anyway, phase 2 of our upgrade to 5.7 includes splitting up each environment into 4 environments, as they are getting quite crowded.  It is our hope, that in splitting them up, that we can isolate the issue.  We shall see soon as I have already begun the process of splitting.

                              • #72705
                                Traci Zee
                                Participant

                                  Baron,

                                  Do you have a Tech Support issue open?  Are you running Rev 1 or Rev 2?

                                  I’ve seen both issues.  I’ve seen the 5.7 Rev 1 have a single process hang and cause message flow issues but no error.  Also, I’ve seen the 5.7 Rev 2 hang the GUI when the client memory is low.

                                  Thanks,

                                  Traci Zee

                                  Emdeon

                                  tzee@emdeon.com

                                  615.932.3960

                                • #72706
                                  Bob Richardson
                                  Participant

                                    Greetings,

                                    Baron, please let us know exactly what the OS version and maintenance level is and the exact Cloverleaf 5.7 version, that is, any Revision patches applied?

                                    And yes please log a complaint to Healthvision tech support and post the CASE number assigned to this forum for our benefit.

                                    Thank you and am thinking positive thoughts toward a solution to this problem for you.

                                  • #72707
                                    Baron Matthews
                                    Participant

                                      Thanks for the replies.

                                      AIX 6.1

                                      GUI Build: 5.7PRev2

                                      Server Build: 5.7PRev2

                                      Everyone could have all of their clients closed and the problem still happens. There seems to be no rhyme or reason to when it happens.  Sometimes, it might be a day or two w/o incidence, then it might be 5 times in a day.  

                                      Soon, I will have deconstructed all the threads into smaller sites.  I think that should help isolate the problem.  I am also contemplating redoing this particular environment completely, shutting the old one down and bringing the new one up.  Everything just takes a bit of time though. 🙂

                                    • #72708
                                      Bob Richardson
                                      Participant

                                        Greetings,

                                        One thought occurred to me since you are on an AIX platform:

                                        did you adjust the MAXUPROC settings and system parameter values

                                        for user account = hci?  The documentation discusses Healthvison’s

                                        recommendations to up the resource allocations for values such as

                                        number of processes; file handles to open, and so forth.

                                        Perhaps your user = hci is running out of resources?

                                        For both our test and live servers, we have upped the MAXUPROC

                                        settings based on the UNIX Installation recommendations.  Our

                                        TEST server with CIS5.7R2 is ok but then we do not have the production

                                        volume to really stress out the CIS5.7R2.

                                        Hope this proves helpful.  Again if a CASE is opened up with support

                                        please post that to this forum.   Thanks.

                                    Viewing 15 reply threads
                                    • The forum ‘Cloverleaf’ is closed to new topics and replies.