CL5.7 rev 2 – aix — memory leak with inbound tcp

This topic has 12 replies, 7 voices, and was last updated 13 years, 6 months ago by Michael Hertel.

Creator

Topic
March 2, 2011 at 10:30 pm #52311
Ryan Spires
Participant
Has anyone encountered any issues with inbound tcp/ip connections that continually bounce and reconnect between messages.

We have an inbound tcp/ip socket (server connection) that is getting errors pretty consistantly in the process error and log…

typically i would consider the errors informational, however, we have discovered what appears to be a potential over consumption of memory.

Upon bouncing the process associated we had nearly 20% of our memory gained back. Coincidental, possibly, but does cause me to look more closely

The pdl signal exception we are getting is as follows;

[pdl :PDL :ERR /0: fr_ghhsm_rpt:03/02/2011 13:42:34] read returned error 0 (Error 0)

[pdl :PDL :ERR /0: fr_ghhsm_rpt:03/02/2011 13:42:34] PDL signaled exception: code 1, msg device error (remote side probably shut down)

The connection is set to reset itself every 5 seconds if connection fails, which promptly reconnects as expected, then drops.

I have tried setting to multi-server, however, this just causes the error to occur as frequently if not more so.

Anyone encounter this or have any thoughts…. —the obvious (have the vendor fix their system) comes to mind but in the meantime, i thought I would poll the group.

thanks,

Ryan Spires
Creator

Topic

Viewing 11 reply threads

Author

Replies
- March 3, 2011 at 10:25 pm #73751
  Bob Richardson
  Participant
  Greetings,
  
  We are also running CIS5.7R2 on AIX 5.3 TL12.
  
  This is more like an informational error that the sender has disconnected and Cloverleaf is listening for the sender to reconnect. We have inbound interfaces where the sender only connects to send messages then disconnects leaving the Cloverleaf server in an “opening” status.
  
  Nothing unusual at least in our experience.
  
  As for the memory leak, we have not seen any so far for our inbound interfaces (we use PDL/TCP protocol).
  
  Have you checked that none of the TCL procedures in the process are leaking message and/or global handles? That would contribute to memory usage creep over time.
  
  Hope this helps.
- March 3, 2011 at 11:04 pm #73752
  David Barr
  Participant
  I haven’t seen problems with memory leaks related to connects and disconnects. I’d check your TCL procs to see if you’re doing something wrong there.
  
  I fixed a memory leak once that was related to an HL7 message parsing library we were using. This particular library created a new TCL command for each message that it parsed (so that it could use more of an object-oriented syntax). The people who were using this library didn’t realize that you had to call a cleanup proc to delete the new command and associated global data.
- March 4, 2011 at 1:05 pm #73753
  Ryan Spires
  Participant
  Thanks for the replies… I’ll check again to see, we were only using the rawHl7ack Proc and one other filter proc… no Xlate in this case….
  
  Only two threads in this particular process…
  
  Ryan Spires
- March 4, 2011 at 2:10 pm #73754
  Bob Richardson
  Participant
  Greetings again,
  
  Does your raw HL7 proc use the GRM calls?
  
  If true there may be leaks in that TCL if the GRM variables like
  
  datlist etc. are not cleaned up.
  
  Check the TCL library in the Clovertech Forum for a version that
  
  uses the split message technique. More performance efficient.
  
  Good hunting!
- March 4, 2011 at 2:33 pm #73755
  Ryan Spires
  Participant
  The two procs, 1 was basically a modified version of a RawHl7Ack did not have any grm statements, pretty basic validation… –Does it have an MSH type stuff and then creating the Ack from hardcoded values… nothing special
  
  The other proc is a filter, which does call two other procs internally to do segment parsing and field parsing. Both of those procs are in the same file and they are not doing GRM either. —not the way i would have written to filter, but to each their own I guess.
  
  In any case, I really don’t see an issue with the procs at this point.
  
  Looks like conditions will always be met to either CONTINUE or KILL the handle based upon the conditions…and no way to fall through without.
  
  I have turned up my e/o for the connection just to see some more noice… I haven’t done so yet for the process, but I will be doing that next.
  
  [pdl :open:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Scheduling driver reopen try in 15.0 secs
  
  [pd :pdtd:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Set driver status to PD_STATUS_OPENING
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:28] Thread has 0 ready events left.
  
  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:28] Performing apply callback for thread 3
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 1 ready events.
  
  [pdl :open:INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:43] Driver attempting reopen
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 0 ready events left.
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 1 ready events.
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:43] Thread has 0 ready events left.
  
  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:43] Performing apply callback for thread 3
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 1 ready events.
  
  [pd :pdtd:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Set driver status to PD_STATUS_UP
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 0 ready events left.
  
  [pti :sche:INFO/2: fr_ghhsm_rpt:03/04/2011 08:21:44] Performing apply callback for thread 3
  
  [pti :sche:INFO/1: fr_ghhsm_rpt:03/04/2011 08:21:44] Thread has 1 ready events.
  
  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] read nothing (link closed)
  
  [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/04/2011 08:21:44] read returned error 0 (Error 0)
  
  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] no PDL exception handler registered => input error
  
  [pdl :PDL :INFO/0: fr_ghhsm_rpt:03/04/2011 08:21:44] input-error in dfa ‘basic-msg’
  
  [pdl :PDL :ERR /0: fr_ghhsm_rpt:03/04/2011 08:21:44] PDL signaled exception: code 1, msg device error (remote side probably shut down)
  
  I did notice something just above “input-error in dfa ‘basic-msg'”
  
  Then the connection drops (goes to opening).. I am not seeing what actually is hitting the pdl. Again this may very well be normal, the pdl being used is the mlp_tcp.pdl, and is in use just about everywhere, so i don’t really think it is the issue or it most definitely would have been noticed by others.
- March 4, 2011 at 3:29 pm #73756
  Rob Abbott
  Keymaster
  It looks like the remote end is connecting and then immediately closing the connection. Since they are not sending any data, nothing hits the PDL other than a close of the session.
  
  You might want to run an IP trace on this port to see exactly what’s happening at the network level.
  
  Rob Abbott
  Cloverleaf Emeritus
- March 4, 2011 at 3:41 pm #73757
  Ryan Spires
  Participant
  The system we are interfacing with is McKesson Horizon Surgery. (HSM)
  
  Anyone else interfacing with this product, inbound reports containing embedded pdf (base 64) in HL7.
  
  We do have an inbound charge interface from the same product that is behaving.
- July 6, 2011 at 8:29 pm #73758
  Ted Viens
  Participant
  Here is additional information that was found.
  
  – The sending system is sending an FIN ACK, which kills the connection, immediatley after receiving the HL7 ACK.
  
  – The inbound thread over time consumes the RAM and necessitates a reboot of the server causing PROD impact.
  
  – Our last reboot was on 6/28.
  
  Questions:
  
  – I am not sure why a transient connection to the IB TCP Server would cause a memory leak on the server. Can anyone clarify?
  
  – What can be done to eliminate the issue? Would moving the TCL procs to an bridge receive solve the problem?
  
  – Is changing the TCP/Client and TCP/Server be an option? This is not typical, but we could set our inbound up as a TCP/Client and connect to the reconfigured outbound, TCP/Server, on the application side.
  
  Ted Viens
- July 6, 2011 at 8:38 pm #73759
  Ted Viens
  Participant
  CORRECTION – There are no TCL Procs being hit on the inbound thread.
  
  We are receiving 64 bit encoded data in OBX.5.
  
  Ted Viens
- July 15, 2011 at 5:44 pm #73760
  Michael Hertel
  Participant
  ~~Quote:~~
  
  Has anyone encountered any issues with inbound tcp/ip connections that continually bounce and reconnect between messages.
  
  Is it possible that someone has two copies of the same external interface running?
  
  We will see this when someone’s test and prod interfaces are configured and turned on to connect to the same Cloverleaf host and port.
- April 3, 2012 at 7:50 pm #73761
  Brad Dorr
  Participant
  Just an update on this issue. It is really becoming a pain in the neck. It has even stopped the server 2 times now and caused us to reboot the AIX server. We get a “No buffer space available” error on the thread and then if you stop anything you cannot get it started so we have to reboot. Even though interfaces are running if they get stopped then it cannot restart nor can you use command line commands, GUI, nothing. If anyone has any ideas I would be glad to chat. AIX Unix 6.1.4
- April 4, 2012 at 4:42 pm #73762
  Michael Hertel
  Participant
  Brad,
  
  No buffer space available happened to us once.
  
  Turned out the source system was not configured to use ack logic.
  
  They just kept dumping on us with huge transcription messages because they weren’t evaluating (waiting) for our ack messages from the engine.
  
  They turned on the “use ack” logic and solved our issue.
Author

Replies

Viewing 11 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.