5.8 Java bug? (Client/Hostserver slow, Disconnections)

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf 5.8 Java bug? (Client/Hostserver slow, Disconnections)

  • Creator
    Topic
  • #54160
    David Teh
    Participant

      Hi folks,

      Another for the X-files.

      I have Cloverleaf 5.8.7 on Solaris 10 running smoothly since early Dec last year. There’s no firewall (yet) for the primary server that Cloverleaf is currently running on.

      [Reason for the “host_server_default_port=34502” entry is for a node in another location with firewall]

      Last Thursday, out of the sudden, IDE client access from machines everywhere was extremely slow. Engine and messaging performance were not affected. SSH telnet access is ok.

      Besides the GUI, only other noticeable difference is the host server.

      – running ‘hciss’ command takes about 30 seconds to return output. Not normal.

      – in the hostserver logs, saw this entry:

    Viewing 17 reply threads
    • Author
      Replies
      • #80386
        Bob Richardson
        Participant

          Greetings,

          I notice that your jvm args is set rather low.

          We have the following setting:

          jvm_args=-Xmx1024m

          You might want to bump up this setting and see if that eliminates the overflow errors for the hostserver.  You will have to cycle the hostserver to pickup the new value.

          Otherwise, INFOR support can supply you the syntax to enable full debug mode for the hostserver log files to gather more information.  You could then post a case at their support website and work the problem with them.

          Another quick thought:  you may just need to periodically cycle the hostsever (maintenance) to recover memory and clear its registers – so to speak.

          Good luck!

        • #80387
          David Teh
          Participant

            Thanks Bob. Will keep that in mind, but the fact that things have been running fine since Dec. Weird.

            Latest findings. A forum post on memory leak led me to shutdown all the test regions that are also running on that same box. Things seem more normal after that, other than the fact that it was 2 am.

            I then did a dbinit on the test regions and brought one of the test regions up. Slowness again! Since a dbinit was done, there’s no messages to invoke any script, so this should rule out memory leak, right?

            Still trying.

            Thanks folks.

          • #80388
            Bob Richardson
            Participant

              David,

              Here is another thought for you to investigate.

              I would suggest reviewing what user limits you have set on the engine user “hci” – you might be starving it (so to speak).  The Release notes

              usually have a section on adjusting system parameters like how many files can user hci open,  how much memory can it use, and so on.

              For us AIX Unix folks (we run AIX 6.1 TL 7) we run >ulimit -a

              and get this output:

              alin1hub:/healthvision/cis5.8/integrator/allina_prod >ulimit -a

              time(seconds)        unlimited

              file(blocks)         unlimited

              data(kbytes)         unlimited

              stack(kbytes)        2097152

              memory(kbytes)       2097152

              coredump(blocks)     2097151

              nofiles(descriptors) 10000

              threads(per process) unlimited

              processes(per user)  unlimited

              Just another avenue to investigate.

              Hope this helps you.

            • #80389
              David Teh
              Participant

                Hi Bob,

                The figures checks out ok.

                Just curious if there are any limits to the number of processes per site or number of threads per process?

                Thanks.

              • #80390
                Bob Richardson
                Participant

                  Greetings,

                  I have attached a docx document that we had received from INFOR back in 2012 that discusses site limits for threads, processes, etc.

                  Note: had to save as a “doc” extension – docx not supported for attachments on this forum.

                  I am not aware of any later revisions – you may want to check on that with INFOR.

                  Hope this proves useful.

                • #80391
                  David Teh
                  Participant

                    Thanks Bob!

                  • #80392
                    Bob Richardson
                    Participant

                      You are welcome!  Pardon the duplicate attachments – I wasn’t sure if my first attempt to attach the document had been successful!

                      Enjoy.

                    • #80393
                      David Teh
                      Participant

                        Hi folks,

                        I’ve confirmed that 2 scripts have memory leak.

                        Both test and production sites are on the same server. We’ve made changes to the test copy of the scripts and end-to-end testings are proceeding. Test processes have been started and things seems to be better, but still not as fast as it used to be prior to when  things started to slow down.

                        I am of course not concluding that those 2 scripts are the culprits yet.

                        Here’s what puzzling me:

                        1. Those 2 scripts have been around for more than 5 years. Was there any changes in the TCL library etc that may cause some issues in 5.8.7?

                        2. We upgraded to 5.8.7 since Dec 2013. Per day, 10-20k of messages will activate that script. Which translates to 10-20k of leaked handles per day. Odd that we got hit only after 4 months (assuming this is the root cause)?

                      • #80394
                        David Teh
                        Participant

                          Attacking this from different angles.

                          I currently have all my test sites started up and running.

                          But, all GUI from Ops desktop and my laptop have been closed.

                          Seems to be fine for the last 5 hours. No disconnection from other systems.

                          And

                        • #80395
                          David Teh
                          Participant

                            Seems to have something to do with the host server

                          • #80396
                            Russ Ross
                            Participant

                              Check to make sure none of your interfaces in any cloverleaf site are using ports in the ephemeral range.

                              On our AIX server that means any port above 32K.

                              I learned the hard way that using ports above 32K in my case, can result in extreme IDE slowness and sometimes confuses the host server enough that it will automatically launch multiple instances of itself until you reach complete lock up with the IDE.

                              If you find new interfaces have been assigned port numbers in the ephemeral range at the same time your slowness showed up unexpectedly, then I can relate because I lived thru the same thing and did not come to understand the dynamics very easily either.

                              Russ Ross
                              RussRoss318@gmail.com

                            • #80397
                              David Teh
                              Participant

                                My Solaris folks came back with some findings on java…

                                Just google for:

                                problem  export DTRACE_DOF_INIT_DISABLE

                                Interestingly, some of the posts also suggest that problems started appearing after months of running fine.

                              • #80398
                                David Teh
                                Participant

                                  Hi folks,

                                  Light at the end of the tunnel!

                                  After running

                                • #80399
                                  David Teh
                                  Participant

                                    Hi folks,

                                    Not much help from Support.

                                    Have anyone updated the Java version that Cloverleaf uses (Java 6u16)…not the one on the Solaris OS?

                                    Any stable, non-buggy version that will not cause more problems that you have tried?

                                    Thanks.

                                  • #80400
                                    David Teh
                                    Participant

                                      Received an email from local support partner on a critical notification on this java bug.

                                      Look out for it.

                                    • #80401
                                      Joe Sellers
                                      Participant

                                        Russ, could you elaborate on the problem you had when using interface ports in the ephemeral range on AIX?  We’re running CL 5.8.4 on AIX 6100-06-01-1043 and occasionally have problems with slow GUI performance and no results returned from the GUI testing tools or database administrator.

                                      • #80402
                                        Russ Ross
                                        Participant

                                          I think we were using cloverleaf 5.2 at that time running under AIX 5.? and what we saw was the following:

                                          – IDE slowness and sometimes complete freeze for all IDEs

                                          – multiple instances of the hostserver would launch on its own

                                          – interface port conflicts, would not connect because port was being used by someting else

                                          Russ Ross
                                          RussRoss318@gmail.com

                                        • #80403
                                          Bob Schmid
                                          Participant

                                            Does this apply to client threads

                                            as well where downstream app dictated?

                                            Bob

                                        Viewing 17 reply threads
                                        • The forum ‘Cloverleaf’ is closed to new topics and replies.