5.8 Java bug? (Client/Hostserver slow, Disconnections)

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf 5.8 Java bug? (Client/Hostserver slow, Disconnections)

  • Creator
    Topic
  • #54160
    David Teh
    Participant

    Hi folks,

    Another for the X-files.

    I have Cloverleaf 5.8.7 on Solaris 10 running smoothly since early Dec last year. There’s no firewall (yet) for the primary server that Cloverleaf is currently running on.

    [Reason for the “host_server_default_port=34502” entry is for a node in another location with firewall]

    Last Thursday, out of the sudden, IDE client access from machines everywhere was extremely slow. Engine and messaging performance were not affected. SSH telnet access is ok.

    Besides the GUI, only other noticeable difference is the host server.

    – running ‘hciss’ command takes about 30 seconds to return output. Not normal.

    – in the hostserver logs, saw this entry:

Viewing 17 reply threads
  • Author
    Replies
    • #80386
      Bob Richardson
      Participant

      Greetings,

      I notice that your jvm args is set rather low.

      We have the following setting:

      jvm_args=-Xmx1024m

      You might want to bump up this setting and see if that eliminates the overflow errors for the hostserver.  You will have to cycle the hostserver to pickup the new value.

      Otherwise, INFOR support can supply you the syntax to enable full debug mode for the hostserver log files to gather more information.  You could then post a case at their support website and work the problem with them.

      Another quick thought:  you may just need to periodically cycle the hostsever (maintenance) to recover memory and clear its registers – so to speak.

      Good luck!

    • #80387
      David Teh
      Participant

      Thanks Bob. Will keep that in mind, but the fact that things have been running fine since Dec. Weird.

      Latest findings. A forum post on memory leak led me to shutdown all the test regions that are also running on that same box. Things seem more normal after that, other than the fact that it was 2 am.

      I then did a dbinit on the test regions and brought one of the test regions up. Slowness again! Since a dbinit was done, there’s no messages to invoke any script, so this should rule out memory leak, right?

      Still trying.

      Thanks folks.

    • #80388
      Bob Richardson
      Participant

      David,

      Here is another thought for you to investigate.

      I would suggest reviewing what user limits you have set on the engine user “hci” – you might be starving it (so to speak).  The Release notes

      usually have a section on adjusting system parameters like how many files can user hci open,  how much memory can it use, and so on.

      For us AIX Unix folks (we run AIX 6.1 TL 7) we run >ulimit -a

      and get this output:

      alin1hub:/healthvision/cis5.8/integrator/allina_prod >ulimit -a

      time(seconds)        unlimited

      file(blocks)         unlimited

      data(kbytes)         unlimited

      stack(kbytes)        2097152

      memory(kbytes)       2097152

      coredump(blocks)     2097151

      nofiles(descriptors) 10000

      threads(per process) unlimited

      processes(per user)  unlimited

      Just another avenue to investigate.

      Hope this helps you.

    • #80389
      David Teh
      Participant

      Hi Bob,

      The figures checks out ok.

      Just curious if there are any limits to the number of processes per site or number of threads per process?

      Thanks.

    • #80390
      Bob Richardson
      Participant

      Greetings,

      I have attached a docx document that we had received from INFOR back in 2012 that discusses site limits for threads, processes, etc.

      Note: had to save as a “doc” extension – docx not supported for attachments on this forum.

      I am not aware of any later revisions – you may want to check on that with INFOR.

      Hope this proves useful.

    • #80391
      David Teh
      Participant

      Thanks Bob!

    • #80392
      Bob Richardson
      Participant

      You are welcome!  Pardon the duplicate attachments – I wasn’t sure if my first attempt to attach the document had been successful!

      Enjoy.

    • #80393
      David Teh
      Participant

      Hi folks,

      I’ve confirmed that 2 scripts have memory leak.

      Both test and production sites are on the same server. We’ve made changes to the test copy of the scripts and end-to-end testings are proceeding. Test processes have been started and things seems to be better, but still not as fast as it used to be prior to when  things started to slow down.

      I am of course not concluding that those 2 scripts are the culprits yet.

      Here’s what puzzling me:

      1. Those 2 scripts have been around for more than 5 years. Was there any changes in the TCL library etc that may cause some issues in 5.8.7?

      2. We upgraded to 5.8.7 since Dec 2013. Per day, 10-20k of messages will activate that script. Which translates to 10-20k of leaked handles per day. Odd that we got hit only after 4 months (assuming this is the root cause)?

    • #80394
      David Teh
      Participant

      Attacking this from different angles.

      I currently have all my test sites started up and running.

      But, all GUI from Ops desktop and my laptop have been closed.

      Seems to be fine for the last 5 hours. No disconnection from other systems.

      And

    • #80395
      David Teh
      Participant

      Seems to have something to do with the host server

    • #80396
      Russ Ross
      Participant

      Check to make sure none of your interfaces in any cloverleaf site are using ports in the ephemeral range.

      On our AIX server that means any port above 32K.

      I learned the hard way that using ports above 32K in my case, can result in extreme IDE slowness and sometimes confuses the host server enough that it will automatically launch multiple instances of itself until you reach complete lock up with the IDE.

      If you find new interfaces have been assigned port numbers in the ephemeral range at the same time your slowness showed up unexpectedly, then I can relate because I lived thru the same thing and did not come to understand the dynamics very easily either.

      Russ Ross
      RussRoss318@gmail.com

    • #80397
      David Teh
      Participant

      My Solaris folks came back with some findings on java…

      Just google for:

      problem  export DTRACE_DOF_INIT_DISABLE

      Interestingly, some of the posts also suggest that problems started appearing after months of running fine.

    • #80398
      David Teh
      Participant

      Hi folks,

      Light at the end of the tunnel!

      After running

    • #80399
      David Teh
      Participant

      Hi folks,

      Not much help from Support.

      Have anyone updated the Java version that Cloverleaf uses (Java 6u16)…not the one on the Solaris OS?

      Any stable, non-buggy version that will not cause more problems that you have tried?

      Thanks.

    • #80400
      David Teh
      Participant

      Received an email from local support partner on a critical notification on this java bug.

      Look out for it.

    • #80401
      Joe Sellers
      Participant

      Russ, could you elaborate on the problem you had when using interface ports in the ephemeral range on AIX?  We’re running CL 5.8.4 on AIX 6100-06-01-1043 and occasionally have problems with slow GUI performance and no results returned from the GUI testing tools or database administrator.

    • #80402
      Russ Ross
      Participant

      I think we were using cloverleaf 5.2 at that time running under AIX 5.? and what we saw was the following:

      – IDE slowness and sometimes complete freeze for all IDEs

      – multiple instances of the hostserver would launch on its own

      – interface port conflicts, would not connect because port was being used by someting else

      Russ Ross
      RussRoss318@gmail.com

    • #80403
      Bob Schmid
      Participant

      Does this apply to client threads

      as well where downstream app dictated?

      Bob

Viewing 17 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,292
Replies
34,435
Topic Tags
286
Empty Topic Tags
10