Threads losing connection

This topic has 10 replies, 6 voices, and was last updated 18 years ago by Richard Hart.

Creator

Topic
June 21, 2007 at 2:41 am #49361
Doug Essner
Participant
We been live on 5.5 Rev1 for about a week and everything is fine except for interfaces with 2 systems. The threads connected to these systems (inbound and outbound) will stop communicating after about 15 minutes(threads have UP status but the system that the thread is communicating with is disconnected) and require the thread to be cycled before the connection will be reestablished. I checked the logs with the error output on highest level – no obvious error. I checked with our network administrators about ethernet and tcp setting – everything is the same as old system. I tried lowering the Cloverleaf server tcp_keepidle down to 3 minutes – no change. Does anyone have a suggestion? Thanks for your help.
Creator

Topic

Viewing 9 reply threads

Author

Replies
- July 3, 2007 at 7:23 pm #61636
  Steve Robertson
  Participant
  Doug,
  
  We’ve had similar problems from time to time in the past. We never could figure out exactly what the problem was, but we always suspected some kind of router timeout on the foreign networks.
  
  We’ve used a couple of different work arounds. One is to write an OS-level script to cycle the threads. Actually, if you do this, I would recommend cycling the processes rather than the threads – We had lots of memory leaks leading to panics when we used script to cycle threads. We are running 5.4.1 on Windoze, by the way.
  
  The other approach is to set up a timer thread that periodically sends what I would call a “heatbeat”. Just a tcl proc to generate an HL7 message header segment with the message type set to something innocuous. You will likely have to coordinate this with the other systems so that they will know to filter out these messages. I can send/post some tcl that we have used if you like.
- July 4, 2007 at 1:01 am #61637
  Richard Hart
  Participant
  Doug.
  
  Im not a network expert, but in previous discussions with our WAN group, there are routers/firewalls that will kill a connection if nothing has been sent for a while – ours is about 2 hours.
  
  As the connection is killed, the TCP termination handshake is not completed and therefor both sides thinks all is OK.
  
  In our scenario, it is only occaisionally that this happens, so the business decided not to use any form of heartbeat/thread bounce.
- July 20, 2007 at 2:43 pm #61638
  Doug Essner
  Participant
  We solved this problem by changing the TCP keep alive(tcp_keepidle) on the 2 systems that were dropping the connections with our Cloverleaf server. Both servers had a very short keep alive (2 minutes) – we set this to the default value of 2hours and our problems went away.
- July 27, 2007 at 1:49 pm #61639
  Tom Rioux
  Participant
  I know this is an old post, but would like to re-address this issue. We are having very similar problems. On the Cloverleaf server, we also set the tcp_keepidle default to 2 hours. Throughout all of this, if we did go to an “opening” state, the connection would be re-established once the client would send a message through. However, now we are running into issues were the connectivity is not being re-established and the client side must bounce their interface to get the messages out of their que.
  
  Any ideas?
  
  Thanks…
  
  Tom Rioux
  
  Baylor Healthcare
- July 27, 2007 at 2:09 pm #61640
  Jim Kosloskey
  Participant
  Tom,
  
  Can you check the client side (specifically getting the same level of log information you can get from Cloverleaf(R))?
  
  The reason I ask, is it seems it is always the foreign system which is causing the issue and checking there first can save me a lot of wasted work.
  
  Off hand it sounds like this is a system which does not keep a persistent connection (something we insist upon here). It may attempt connection only when it has something to send and once it has emptied it’s output buffer, it may close the connection.
  
  Jim Kosloskey
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- July 27, 2007 at 6:45 pm #61641
  Tom Rioux
  Participant
  The system is Eclipsys. Supposedly they have a constant connection. It hasn’t been an issue at other places that I’ve worked in the past.
- July 27, 2007 at 10:57 pm #61642
  Jim Kosloskey
  Participant
  Tom,
  
  It is still not a bad idea to get access to Eclypsis’ log (if there is one or more); turn up the engine ‘noise’ level; cause the issue and see what each system says.
  
  My money is still on the foreign system causing the issue.
  
  Jim Koslosky
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- July 30, 2007 at 2:16 pm #61643
  Tom Rioux
  Participant
  Just a couple more tidbits of info….I’m told there is not a firewall between the two servers. Oh, and by the way, the Eclipsys server is a Windows server. The outbound connections to Eclipsys are fine, it is only the inbound connections from Eclipsys that are the issue. I’m working on getting the logs from the Eclipsys box.
- July 30, 2007 at 9:14 pm #61644
  Russ Ross
  Participant
  Tom:
  
  By the way, I see your profile still shows you at Memorial Herman so you may want to update that information if it is no longer accurate.
  
  If I can get people to listen I usually recommend an interface send a dummy message every so often to help with all sorts of issues including the one you are having.
  
  Unfortuantely, most of the time nobody listens.
  
  So lets say for agruement sake the sender from Eclipysis is not persistant at your site or at least you claim it is behaving that way.
  
  See if you can have Eclypsis schedule sending a dummy message that you look for and kill however often you think will get rid of your problems.
  
  I find this especially usefull for making last recevied alerts more proactive and elliminates false alerts.
  
  Now here is the icing on the cake if the vendor can do it.
  
  Instead of just having them send a dummy messages via their sending interface, have them get the application to generate the dummy message.
  
  This way the alerts not only cover the interface but also let you know when there is any kind of break in the pathway.
  
  What more could you ask for.
  
  The funny thing is once you do this they will start depending on you to tell them when their system has problems.
  
  Now you would think that would be something they would do but I here something like this all the time, “Why didn’t Cloverleaf tell me my system had a problem?”
  
  So should I laugh or shake my head.
  
  However, I do feel it gives cloverleaf more job security by watchin after other systems this way.
  
  Russ Ross
  RussRoss318@gmail.com
- July 31, 2007 at 1:49 am #61645
  Richard Hart
  Participant
  We are running the iSoft version of this product.
  
  We get issues like this when the iCM interfaces crash as they don’t complete a TCP ‘bye’.
  
  Our network keepalive is 2 hours and these issues are infrequent.
  
  We had connection issues when we performed a large historical load and after about 8 hours, the throughput would deteriorate and Cloverleaf logs would display errors. A bounce of both sides fixed this.
  
  The WAN guys pin-pointed the error on the iCM side, with TCP acks not being sent back – but the vendor was unsupportive!
Author

Replies

Viewing 9 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.