CL5.7 rev 2 – aix — memory leak with inbound tcp

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf CL5.7 rev 2 – aix — memory leak with inbound tcp

  • Creator
    Topic
  • #52311
    Ryan Spires
    Participant

      Has anyone encountered any issues with inbound tcp/ip connections that continually bounce and reconnect between messages.

      We have an inbound tcp/ip socket (server connection) that is getting errors pretty consistantly in the process error and log…

      typically i would consider the errors informational, however, we have discovered what appears to be a potential over consumption of memory.

      Upon bouncing the process associated we had nearly 20% of our memory gained back.  Coincidental, possibly, but does cause me to look more closely

      The pdl signal exception we are getting is as follows;

      [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/02/2011 13:42:34] read returned error 0 (Error 0)

      [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/02/2011 13:42:34] PDL signaled exception: code 1, msg device error (remote side probably shut down)

      The connection is set to reset itself every 5 seconds if connection fails, which promptly reconnects as expected, then drops.

      I have tried setting to multi-server, however, this just causes the error to occur as frequently if not more so.

      Anyone encounter this or  have any thoughts…. —the obvious (have the vendor fix their system) comes to mind but in the meantime, i thought I would poll the group.

      thanks,

      Ryan Spires

    Viewing 11 reply threads
    • Author
      Replies
      • #73751
        Bob Richardson
        Participant

          Greetings,

          We are also running CIS5.7R2 on AIX 5.3 TL12.

          This is more like an informational error that the sender has disconnected and Cloverleaf is listening for the sender to reconnect.  We have inbound interfaces where the sender only connects to send messages then disconnects leaving the Cloverleaf server in an “opening” status.

          Nothing unusual at least in our experience.

          As for the memory leak, we have not seen any so far for our inbound interfaces (we use PDL/TCP protocol).

          Have you checked that none of the TCL procedures in the process are leaking message and/or global handles?   That would contribute to memory usage creep over time.

          Hope this helps.

        • #73752
          David Barr
          Participant

            I haven’t seen problems with memory leaks related to connects and disconnects. I’d check your TCL procs to see if you’re doing something wrong there.

            I fixed a memory leak once that was related to an HL7 message parsing library we were using. This particular library created a new TCL command for each message that it parsed (so that it could use more of an object-oriented syntax). The people who were using this library didn’t realize that you had to call a cleanup proc to delete the new command and associated global data.

          • #73753
            Ryan Spires
            Participant

              Thanks for the replies…  I’ll check again to see, we were only using the rawHl7ack Proc and one other filter proc… no Xlate in this case….

              Only two threads in this particular process…

              Ryan Spires

            • #73754
              Bob Richardson
              Participant

                Greetings again,

                Does your raw HL7 proc use the GRM calls?

                If true there may be leaks in that TCL if the GRM variables like

                datlist etc. are not cleaned up.  

                Check the TCL library in the Clovertech Forum for a version that

                uses the split message technique.  More performance efficient.

                Good hunting!

              • #73755
                Ryan Spires
                Participant

                  The two procs, 1 was basically a modified version of a RawHl7Ack did not have any grm statements, pretty basic validation… –Does it have an MSH type stuff and then creating the Ack from hardcoded values… nothing special

                  The other proc is a filter, which does call two other procs internally to do segment parsing and field parsing.  Both of those procs are in the same file and they are not doing GRM either.  —not the way i would have written to filter, but to each their own I guess.

                  In any case, I really don’t see an issue with the procs at this point.

                  Looks like conditions will always be met to either CONTINUE or KILL  the  handle based upon the conditions…and no way to  fall through without.

                  I have turned up my e/o for the connection just to see some more noice… I haven’t done so yet for the process, but I will be doing that next.

                  [pdl :open:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Scheduling driver reopen try in 15.0 secs

                  [pd  :pdtd:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Set driver status to PD_STATUS_OPENING

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Thread has 0 ready events left.

                  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:28] Performing apply callback for thread 3

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 1 ready events.

                  [pdl :open:INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:43] Driver attempting reopen

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 0 ready events left.

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 1 ready events.

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 0 ready events left.

                  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:43] Performing apply callback for thread 3

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 1 ready events.

                  [pd  :pdtd:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Set driver status to PD_STATUS_UP

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 0 ready events left.

                  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:44] Performing apply callback for thread 3

                  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 1 ready events.

                  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] read nothing (link closed)

                  [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/04/2011 08:21:44] read returned error 0 (Error 0)

                  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] no PDL exception handler registered => input error

                  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] input-error in dfa ‘basic-msg’

                  [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/04/2011 08:21:44] PDL signaled exception: code 1, msg device error (remote side probably shut down)

                  I did notice something just above “input-error in dfa ‘basic-msg'”

                  Then the connection drops (goes to opening).. I am not seeing what actually is hitting the pdl.    Again this may very well be normal, the pdl being used is the mlp_tcp.pdl, and is in use just about everywhere, so i don’t really think it is the issue or it most definitely would have been noticed by others.

                • #73756
                  Rob Abbott
                  Keymaster

                    It looks like the remote end is connecting and then immediately closing the connection.  Since they are not sending any data, nothing hits the PDL other than a close of the session.

                    You might want to run an IP trace on this port to see exactly what’s happening at the network level.

                    Rob Abbott
                    Cloverleaf Emeritus

                  • #73757
                    Ryan Spires
                    Participant

                      The system we are interfacing with is McKesson Horizon Surgery. (HSM)

                      Anyone else interfacing with this product, inbound reports containing embedded pdf (base 64) in HL7.

                      We do have an inbound charge interface from the same product that is behaving.

                    • #73758
                      Ted Viens
                      Participant

                        Here is additional information that was found.

                        – The sending system is sending an FIN ACK, which kills the connection, immediatley after receiving the HL7 ACK.

                        – The inbound thread over time consumes the RAM and necessitates a reboot of the server causing PROD impact.

                        – Our last reboot was on 6/28.

                        Questions:

                        – I am not sure why a transient connection to the IB TCP Server would cause a memory leak on the server.  Can anyone clarify?

                        – What can be done to eliminate the issue?  Would moving the TCL procs to an bridge receive solve the problem?

                        – Is changing the TCP/Client and TCP/Server be an option?  This is not typical, but we could set our inbound up as a TCP/Client and connect to the reconfigured outbound, TCP/Server, on the application side.

                      • #73759
                        Ted Viens
                        Participant

                          CORRECTION – There are no TCL Procs being hit on the inbound thread.

                          We are receiving 64 bit encoded data in OBX.5.

                        • #73760
                          Michael Hertel
                          Participant

                            Quote:

                            Has anyone encountered any issues with inbound tcp/ip connections that continually bounce and reconnect between messages.

                            Is it possible that someone has two copies of the same external interface running?

                            We will see this when someone’s test and prod interfaces are configured and turned on to connect to the same Cloverleaf host and port.

                          • #73761
                            Brad Dorr
                            Participant

                              Just an update on this issue.  It is really becoming a pain in the neck.  It has even stopped the server 2 times now and caused us to reboot the AIX server.  We get a  “No buffer space available” error on the thread and then if you stop anything you cannot get it started so we have to reboot.  Even though interfaces are running if they get stopped then it cannot restart nor can you use command line commands, GUI, nothing.  If anyone has any ideas I would be glad to chat.  AIX Unix 6.1.4

                            • #73762
                              Michael Hertel
                              Participant

                                Brad,

                                No buffer space available happened to us once.

                                Turned out the source system was not configured to use ack logic.

                                They just kept dumping on us with huge transcription messages because they weren’t evaluating (waiting) for our ack messages from the engine.

                                They turned on the “use ack” logic and solved our issue.

                            Viewing 11 reply threads
                            • The forum ‘Cloverleaf’ is closed to new topics and replies.