Extreme Slowness with CL 6.1.2 Windows

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Extreme Slowness with CL 6.1.2 Windows

  • Creator
    Topic
  • #55161
    Mike Kim
    Participant

      We recently upgraded our test site from CL 5.7 on Linux to CL 6.1.2 on Windows and are having extreme slowness processing messages.  Server is Windows 2012 R2, 4 processor 16GB RAM (way more horsepower than what we have for the Linux production machine). CPU and disk utilization is very low. But messages queue up and take forever to even process through an Xlate and write to a file.  Tried switching back from SMAT DB to old school SMAT files.  That didn’t help.  We uninstalled all the anti-virus software. No improvement.  Support is through McKesson and they’re stumped.

      It’s so slow that we keep getting these errors from cl_check_ack on the receiving threads:

      ‘KILL ‘ (returned by ‘cl_check_ack ‘) does not match { }

      [pd  :pdtd:WARN/0:       to_xxxx:08/16/2016 11:01:17] Timed out while awaiting replies on thread. Resending reserved OB Message

      Any thoughts/feedback greatly appreciated!

    Viewing 7 reply threads
    • Author
      Replies
      • #84380
        Elisha Gould
        Participant

          Where are you using cl_check_ack? It can only be used in the TPS Inbound Reply.

          Also not 100% convinced about the code for that proc. Should the OBMSGID message be killed? I’m thinking the handling for this case changed in 5.8 (could be wrong).

          To accept the reply and process the next message:

          return “{KILLREPLY $mh}”

          To resend the message:

          return “{KILL $mh}”

        • #84381
          Charlie Bursell
          Participant

            The error indicates you are getting a message in the reply proc the engine says does not exist – no message handle.   Do you have OB only set?  You may be seeing something after the timeout.  Weird!

            I hope you do not have Await Replies on when writing to a file!  That would take forever  ðŸ˜€  If writing to TCP/IP have your IT guys put a sniffer on it and see what is really happening.

            I wrote the proc in question and it did not change.  OBMSGID hold the handle of the original message sent in case you need to resend.  Of course it must be disposed of either via a KILL or a PROTO to resend.

            Elisha:

            You cannot issue KILL for the reply handle.  To do so would lock up the engine.  The difference between KILL and KILLREPLY is that in addition to disposing of the handle it clears the await reply flag so the next message can be sent.

            We Usually store the OBMSGID in a variable, “my_mh” so the proper paradigm for your cases would be.

            To accept the reply and process the next message:

            return “{KILLREPLY $mh  KILL $my_mh}”

            To resend the message:

            return “{KILLREPLY $mh  PROTO $my_mh}”

          • #84382
            Elisha Gould
            Participant

              Ahh yes, your right Charlie.

              I forgot the reason that we use KILL is to ensure that we don’t flood the down stream system will messages, and cause more issues with filled up logs.

              If KILLREPLY/PROTO is used in sms_ib_reply, it will resend immediately, with no timeout before resending.

              If KILL is used, it will go to the Timeout Handling, so we have a proc in the Reply Generation to handle the resending.

            • #84383
              Russ Ross
              Participant

                Mike Kim you are welcome to call me and discuss.

                We saw inexplicable slowdown and bottle necks when upgrading from cloverleaf 5.6 to cloverleaf 6.0 on the same hardware and AIX 6.1 OS.

                We did several things that ended up being band aids but did help.

                One significant speed improvement was upgrading the OS from AIX 6.1 to AIX 7.1 in place, which actually worked without any observed issue.

                My suspicion is that one culprit might of been our check_reply proc did not look at OBMSGID (which I think is state 16).

                What we had in place from before was dealing with messages in state 14.

                When Viken held our hand through our Epic go-live, I had him add the logic to work both the old way and new way with OBSMGID that also utlizies a different NetConfig setup that our old way of implementing check reply.

                One easy way to determine if your check_reply proc is current is to grep it for OBMSGID and see if you get any hit at all.

                Russ Ross
                RussRoss318@gmail.com

              • #84384
                Russ Ross
                Participant

                  Now having said that bit about OBMSGID being part of the new improved check_reply, let me say I have seen the sort of error you mentioned often enough I recall it.

                  I’m talking about

                  ‘KILL ‘ (returned by ‘cl_check_ack ‘) does not match { }

                  even when the reply handling is properly configured.

                  The most common cause of this has been an extra new-line character after the ending message encapsulation and before the beginning message encapsulation of the next message.

                  This situation confuses the check_reply proc and if it results in a timeout and resend then that would cause message flow speed to be potentially horrible since this might happen for every message.

                  Russ Ross
                  RussRoss318@gmail.com

                • #84385
                  Mike Kim
                  Participant

                    Thanks Charlie, Elisha and Ross for thoughtful feedback.  Everyone is still stumped.  We are kind of in a bind because support is through McKesson and they’ll only support production, not test.

                    cl_check_ack is standard version with a slight modification I made in April to not assume MSA segment was 2nd segment in message (we had vendor sending SFT segment in front of MSA).  To find the MSA index, I split the segments into a list and do an lsearch.  

                    I misspoke about where cl_check_ack is deployed.  It is configured on sending, not receiving threads.  Messages take several minutes to make it through the engine with almost no activity.

                    Any ideas greatly appreciated!

                  • #84386
                    David Coffey
                    Participant

                      I have in the last 90 days set up 6.12 on a 4 processor 2012 server such as yours.  I also experienced the extremely slow throughput.  

                      I trolled the Clovertech logs and came across an entry with a reference to the  the poor performance to the lower level socket related activities and the auto connect/reopon times.  The initial issue was reported with previous versions.

                      https://usspvlclovertch2.infor.com/viewtopic.php?t=5046&highlight=throughput  

                      I made changes to my server configuration relaxing the auto connect/reopen times from 5 seconds (somewhat aggressive!)  to 60 seconds for both inbound and outbound threads.  This corrected the poor throughout issue and I was able to being my testing.

                    • #84387
                      Mike Kim
                      Participant

                        Hi David,

                        Your advice about the auto reconnect times was spot on.  I think that helped a lot and also, apparently the sys admins still had some anti-virus software still on there that might have been a contributing factor.  Much faster now.  Thanks!

                    Viewing 7 reply threads
                    • The forum ‘Cloverleaf’ is closed to new topics and replies.