Inbound disconnected, but CL still shows "UP"

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Inbound disconnected, but CL still shows "UP"

  • Creator
    Topic
  • #52114
    Jeff Dinsmore
    Participant

      I’ve had three instances in the last week where an inbound connection shows “UP”, but the sending end is disconnected and is refused connection when it tries to reconnect.

      I don’t know if this is a Cloverleaf problem, a problem with the sender, or both, but I do know I need to fix it so it doesn’t fail in the middle of the night – I’m not getting enough sleep ;o)

      So, my thought is to attack this from two angles.

      1) config the inbound connection to be Multi-Server so that the remote can reconnect even though CL thinks its’ still connected.

      2) as a fallback, config an alert to bounce the listener if it doesn’t receive any messages for N minutes.

      Does that sound reasonable?

      Any techniques that might work better?

      Thanks!

      Jeff Dinsmore
      Chesapeake Regional Healthcare

    Viewing 21 reply threads
    • Author
      Replies
      • #73123
        Rob Abbott
        Keymaster

          Normally we see this if there’s a firewall or something in between the endpoints that drops or times out the connection without notifying either side.

          The Multi-server configuration should fix it up; make sure if you have a reply proc that it’s configured properly for multi-server.

          Rob Abbott
          Cloverleaf Emeritus

        • #73124
          Jeff Dinsmore
          Participant

            No firewall involved – all devices are on the same LAN.

            I’m a relative CL newbie. How should a reply proc be properly configured for Multi-Server?

            Jeff Dinsmore
            Chesapeake Regional Healthcare

          • #73125
            Chris Williams
            Participant

              Jeff,

              This issue is not necessarily restricted to firewalls. It can be any switch or router on your LAN through which the connection passes. They all have time-out parameters that can cause you grief.

            • #73126
              Jeff Dinsmore
              Participant

                We discovered that this is the result of a daily snapshot backup of our Cloverleaf virtual machine.

                Right at the end of the backup, the sending client senses a disconnect and goes into a reconnection mode. Cloverleaf shows the connection as still “up”, and so refuses connection.

                The solution for now is to not do the backup, but that’s not a good solution either.

                Have any of you seen similar behavior?

                Others out there running Cloverleaf on VMWare?

                Jeff Dinsmore
                Chesapeake Regional Healthcare

              • #73127
                James Cobane
                Participant

                  Jeff,

                  One work-around to this might be to define your connection as “Multi-Server” to allow multiple connections.  We’ve done this on some threads where the vendor doesn’t always cleanly break and then wants to re-connect.

                  Hope this helps.

                  Jim Cobane

                  Henry Ford Health

                • #73128
                  Robert Denny
                  Participant

                    We are having a similar issue. Win2003/CL5.3rev3.

                    We have one lab that we send orders and receive results from, they utilize a virtual ip address system.  We have had issues where for one reason or another they flip between the two nic cards that are attached to the virtual address. We are setup with both address on our vpn tunnel, but if there is any disruption in the connection. We lose connectivity and have to manually change from the one NIC card address to the other NIC card address.

                    Would the multi configuration work with this issue?

                  • #73129
                    Chris Roca
                    Participant

                      We are running 5.7 in a virtual environment and had issues losing connectivity with threads haphazardly.

                      We pinpointed the problem to our VMware, ‘vmotioning’ the cloverleaf server to another virtual machine. Once we anchored the cloverleaf server to one VM Host the issue went away. Doesn’t help with HA but stablized the environment.

                      Any other solutions are welcome

                    • #73130
                      Jeff Dinsmore
                      Participant

                        It seems that the problem for us was the Avamar software we’re using to back up our VMs. We were able to reproduce the error when we ran the backup in the middle of the day.

                        When the backup completed, it would disconnect the interface and it would not reconnect. Odd that it didn’t do that to other interfaces.

                        We’re currently investigating if this is happening on our Horizon Clinicals (CareLink) interface as well.

                        Jeff Dinsmore
                        Chesapeake Regional Healthcare

                      • #73131
                        Ian Morris
                        Participant

                          Just wanted to let everyone know that we experienced the exact same Up-but-sporadically-not-receiving-messages issue as the OP.  We worked with Support who advised us to change our thread to multi server.  Since then, 20 hours ago, we have not had the issue.  Probably still too early to say it’s fixed but wanted to share that with everyone.

                        • #73132
                          Jeff Dinsmore
                          Participant

                            I tried multi-server as well, but discovered that message transfer was then painfully slow – several seconds per message.

                            I have not had the opportunity to dig into why.

                            Jeff Dinsmore
                            Chesapeake Regional Healthcare

                          • #73133
                            Ian Morris
                            Participant

                              We also have a theory…Our latest server is a VM Red Hat 5.3 server.

                            • #73134
                              Rob Abbott
                              Keymaster

                                Jeff – multiserver should not affect performance like you describe.  Note that if you have any procs that generate acknowledgments or other outbound traffic to a multiserver connection you have to populate DRIVERCTL with a CONNID key – something like this:

                                Code:

                                msgmetaset $ackMh DRIVERCTL [msgmetaget $mh DRIVERCTL]

                                If you don’t have this logic in place then you wouldn’t be sending an ACK and the other end may be timing out.

                                Hope this helps.

                                Rob Abbott
                                Cloverleaf Emeritus

                              • #73135
                                Bevan Richards
                                Participant

                                  I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.

                                • #73136
                                  Ian Morris
                                  Participant

                                    Bevan Richards wrote:

                                    I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.

                                    Did you try configuring your thread as multiserver?

                                  • #73137
                                    Jeff Dinsmore
                                    Participant

                                      I’d like to revisit this topic.

                                      Since I originally posted this I’ve gained a better understanding of how Cloverleaf works, and it would appear that this “showing up, but not communicating” state is caused by an abrupt severing of the connection between Cloverleaf and a given connection partner.

                                      Whether the disconnect is caused by the network or the other end of the connection is of no real concern.  The primary issue is that Cloverleaf, for whatever reason, doesn’t sense the disconnect.

                                      We primarily see these disconnects on outbound clients – when an outbound queue builds up – so multi-server doesn’t help with that.

                                      We’re currently running CL5.6.  Do more recent versions handle these disconnect events better?

                                      Do any of you use other techniques, besides setting an alert to auto-restart or tweaking network protocols, to sense/recover from this type of failure?

                                      Jeff Dinsmore
                                      Chesapeake Regional Healthcare

                                    • #73138
                                      Robert Milfajt
                                      Participant

                                        That just about covers the options right there.  The problem is with the OSI model for communication and the fact that the disconnect happens at a lower level (physical, data link, network-IP or transport-TCP) and that Cloverleaf, running at the application layer, is not informed of this.  The beauty of this model and keeping things separate, so that you don’t have to write code for TCP, etc., is also one of its problems, i.e., how to detect this.

                                        Hope this helps,

                                        Robert Milfajt
                                        Northwestern Medicine
                                        Chicago, IL

                                      • #73139
                                        mike brown
                                        Participant

                                          Hi i opened a INFOR Ticket i have several inbounds that need the multi-server setup.

                                          But can someone provide me the exact steps on how to create this in a new hl7_raw_ack proc.

                                          mike

                                        • #73140
                                          Terry Kellum
                                          Participant

                                            We are on Redhat 5.3, on ESX 4.1.  We have our NICs set to “Flexible”.  We don’t have any issues with failover.

                                          • #73141
                                            David Coffey
                                            Participant

                                              This issue lives in my environment.  Clover 5.8.5 running on Windows 2008.  

                                              Using Wireshark I have seen the other system send a RST packet which as a request to tear down the connection.  This request is never honored.   The stack should honor the request pass it up to Clover and the Clover thread should recycle and go into a Listen state.  

                                              At this time I do not know if the issue is the Windows stack or Clover.

                                              I believe running in MultiServer is a hack to address a flaw.  It also introduces a small security risk, leave an interface with a permanent Listen pending for anyone to connect to and inject something into the interface.

                                            • #73142
                                              Terry Kellum
                                              Participant

                                                We started on Windows in 2003, and had TCP issues before we went live in 2004.  We ditched windows and deployed on Red Hat.  You have a different set of things to look at, and a more specialized Sys Admin environment, but I’ve never regretted that decision.  It’s important if you deploy on Linux that you make the system tweaks listed in the installation instructions.

                                              • #73143
                                                James Cobane
                                                Participant

                                                  David,

                                                  I believe the issue is at a lower level within the OSI than the application layer (where Cloverleaf is).  Essentially, Cloverleaf is not being informed of the disconnect from the OS communications, so it believes it is still connected.

                                                  I think the multi-server option to address this is more of a “work-around” than a “hack” 😉

                                                  Jim Cobane

                                                  Henry Ford Health

                                                • #73144
                                                  David Coffey
                                                  Participant

                                                    Work around or hack.   Either way it should not have to be done.

                                                Viewing 21 reply threads
                                                • The forum ‘Cloverleaf’ is closed to new topics and replies.