Threads losing connection

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Threads losing connection

  • Creator
    Topic
  • #49361
    Doug Essner
    Participant

      We been live on 5.5 Rev1 for about a week and everything is fine except for interfaces with 2 systems. The threads connected to these systems (inbound and outbound) will stop communicating after about 15 minutes(threads have UP status but the system  that the thread is communicating with is disconnected)  and require the thread to be cycled before the connection will be reestablished.  I checked the logs with the error output on highest level – no obvious error.  I checked with our network administrators about ethernet and tcp setting – everything is the same as old system. I tried lowering the Cloverleaf server tcp_keepidle down to 3 minutes – no change.  Does anyone have a  suggestion?  Thanks for your help.

    Viewing 9 reply threads
    • Author
      Replies
      • #61636
        Steve Robertson
        Participant

          Doug,

          We’ve had similar problems from time to time in the past. We never could figure out exactly what the problem was, but we always suspected some kind of router timeout on the foreign networks.

          We’ve used a couple of different work arounds. One is to write an OS-level script to cycle the threads. Actually, if you do this, I would recommend cycling the processes rather than the threads – We had lots of memory leaks leading to panics when we used script to cycle threads. We are running 5.4.1 on Windoze, by the way.

          The other approach is to set up a timer thread that periodically sends what I would call a “heatbeat”. Just a tcl proc to generate an HL7 message header segment with the message type set to something innocuous. You will likely have to coordinate this with the other systems so that they will know to filter out these messages. I can send/post some tcl that we have used if you like.

        • #61637
          Richard Hart
          Participant

            Doug.

            Im not a network expert, but in previous discussions with our WAN group, there are routers/firewalls that will kill a connection if nothing has been sent for a while – ours is about 2 hours.

            As the connection is killed, the TCP termination handshake is not completed and therefor both sides thinks all is OK.

            In our scenario, it is only occaisionally that this happens, so the business decided not to use any form of heartbeat/thread bounce.

          • #61638
            Doug Essner
            Participant

              We solved this problem by changing the TCP keep alive(tcp_keepidle) on the 2 systems that were dropping the connections with our Cloverleaf server.  Both servers had a very short keep alive (2 minutes) – we set this to the default value of 2hours and our problems went away.

            • #61639
              Tom Rioux
              Participant

                I know this is an old post, but would like to re-address this issue.   We are having very similar problems.  On the Cloverleaf server, we also set the tcp_keepidle default to 2 hours.  Throughout all of this, if we did go to an “opening” state, the connection would be re-established once the client would send a message through.   However, now we are running into issues were the connectivity is not being re-established and the client side must bounce their interface to get the messages out of their que.

                Any ideas?

                Thanks…

                Tom Rioux

                Baylor Healthcare

              • #61640
                Jim Kosloskey
                Participant

                  Tom,

                  Can you check the client side (specifically getting the same level of log information you can get from Cloverleaf(R))?

                  The reason I ask, is it seems it is always the foreign system which is causing the issue and checking there first can save me a lot of wasted work.

                  Off hand it sounds like this is a system which does not keep a persistent connection (something we insist upon here). It may attempt connection only when it has something to send and once it has emptied it’s output buffer, it may close the connection.

                  Jim Kosloskey

                  email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                • #61641
                  Tom Rioux
                  Participant

                    The system is Eclipsys.  Supposedly they have a constant connection.  It hasn’t been an issue at other places that I’ve worked in the past.

                  • #61642
                    Jim Kosloskey
                    Participant

                      Tom,

                      It is still not a bad idea to get access to Eclypsis’ log (if there is one or more); turn up the engine ‘noise’ level; cause the issue and see what each system says.

                      My money is still on the foreign system causing the issue.

                      Jim Koslosky

                      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                    • #61643
                      Tom Rioux
                      Participant

                        Just a couple more tidbits of info….I’m told there is not a firewall between the two servers.  Oh, and by the way, the Eclipsys server is a Windows server.   The outbound connections to Eclipsys are fine, it is only the inbound connections from Eclipsys that are the issue.   I’m working on getting the logs from the Eclipsys box.

                      • #61644
                        Russ Ross
                        Participant

                          Tom:

                          By the way, I see your profile still shows you at Memorial Herman so you may want to update that information if it is no longer accurate.

                          If I can get people to listen I usually recommend an interface send a dummy message every so often to help with all sorts of issues including the one you are having.

                          Unfortuantely, most of the time nobody listens.

                          So lets say for agruement sake the sender from Eclipysis is not persistant at your site or at least you claim it is behaving that way.

                          See if you can have Eclypsis schedule sending a dummy message that you look for and kill however often you think will get rid of your problems.

                          I find this especially usefull for making last recevied alerts more proactive and elliminates false alerts.

                          Now here is the icing on the cake if the vendor can do it.

                          Instead of just having them send a dummy messages via their sending interface, have them get the application to generate the dummy message.

                          This way the alerts not only cover the interface but also let you know when there is any kind of break in the pathway.

                          What more could you ask for.

                          The funny thing is once you do this they will start depending on you to tell them when their system has problems.

                          Now you would think that would be something they would do but I here something like this all the time, “Why didn’t Cloverleaf tell me my system had a problem?”

                          So should I laugh or shake my head.

                          However, I do feel it gives cloverleaf more job security by watchin after other systems this way.

                          Russ Ross
                          RussRoss318@gmail.com

                        • #61645
                          Richard Hart
                          Participant

                            We are running the iSoft version of this product.

                            We get issues like this when the iCM interfaces crash as they don’t complete a TCP ‘bye’.

                            Our network keepalive is 2 hours and these issues are infrequent.

                            We had connection issues when we performed a large historical load and after about 8 hours, the throughput would deteriorate and Cloverleaf logs would display errors.  A bounce of both sides fixed this.

                            The WAN guys pin-pointed the error on the iCM side, with TCP acks not being sent back –  but the vendor was unsupportive!

                        Viewing 9 reply threads
                        • The forum ‘Cloverleaf’ is closed to new topics and replies.