Inbound disconnected, but CL still shows "UP"

This topic has 22 replies, 12 voices, and was last updated 10 years, 5 months ago by David Coffey.

Creator

Topic
November 16, 2010 at 8:52 pm #52114
Jeff Dinsmore
Participant
I’ve had three instances in the last week where an inbound connection shows “UP”, but the sending end is disconnected and is refused connection when it tries to reconnect.

I don’t know if this is a Cloverleaf problem, a problem with the sender, or both, but I do know I need to fix it so it doesn’t fail in the middle of the night – I’m not getting enough sleep ;o)

So, my thought is to attack this from two angles.

1) config the inbound connection to be Multi-Server so that the remote can reconnect even though CL thinks its’ still connected.

2) as a fallback, config an alert to bounce the listener if it doesn’t receive any messages for N minutes.

Does that sound reasonable?

Any techniques that might work better?

Thanks!

Jeff Dinsmore
Chesapeake Regional Healthcare
Creator

Topic

Viewing 21 reply threads

Author

Replies
- November 16, 2010 at 9:01 pm #73123
  Rob Abbott
  Keymaster
  Normally we see this if there’s a firewall or something in between the endpoints that drops or times out the connection without notifying either side.
  
  The Multi-server configuration should fix it up; make sure if you have a reply proc that it’s configured properly for multi-server.
  
  Rob Abbott
  Cloverleaf Emeritus
- November 16, 2010 at 10:10 pm #73124
  Jeff Dinsmore
  Participant
  No firewall involved – all devices are on the same LAN.
  
  I’m a relative CL newbie. How should a reply proc be properly configured for Multi-Server?
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- November 17, 2010 at 5:00 pm #73125
  Chris Williams
  Participant
  Jeff,
  
  This issue is not necessarily restricted to firewalls. It can be any switch or router on your LAN through which the connection passes. They all have time-out parameters that can cause you grief.
- December 3, 2010 at 9:25 pm #73126
  Jeff Dinsmore
  Participant
  We discovered that this is the result of a daily snapshot backup of our Cloverleaf virtual machine.
  
  Right at the end of the backup, the sending client senses a disconnect and goes into a reconnection mode. Cloverleaf shows the connection as still “up”, and so refuses connection.
  
  The solution for now is to not do the backup, but that’s not a good solution either.
  
  Have any of you seen similar behavior?
  
  Others out there running Cloverleaf on VMWare?
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- December 3, 2010 at 9:33 pm #73127
  James Cobane
  Participant
  Jeff,
  
  One work-around to this might be to define your connection as “Multi-Server” to allow multiple connections. We’ve done this on some threads where the vendor doesn’t always cleanly break and then wants to re-connect.
  
  Hope this helps.
  
  Jim Cobane
  
  Henry Ford Health
- January 12, 2011 at 5:17 pm #73128
  Robert Denny
  Participant
  We are having a similar issue. Win2003/CL5.3rev3.
  
  We have one lab that we send orders and receive results from, they utilize a virtual ip address system. We have had issues where for one reason or another they flip between the two nic cards that are attached to the virtual address. We are setup with both address on our vpn tunnel, but if there is any disruption in the connection. We lose connectivity and have to manually change from the one NIC card address to the other NIC card address.
  
  Would the multi configuration work with this issue?
- January 17, 2011 at 3:35 pm #73129
  Chris Roca
  Participant
  We are running 5.7 in a virtual environment and had issues losing connectivity with threads haphazardly.
  
  We pinpointed the problem to our VMware, ‘vmotioning’ the cloverleaf server to another virtual machine. Once we anchored the cloverleaf server to one VM Host the issue went away. Doesn’t help with HA but stablized the environment.
  
  Any other solutions are welcome
- January 17, 2011 at 4:27 pm #73130
  Jeff Dinsmore
  Participant
  It seems that the problem for us was the Avamar software we’re using to back up our VMs. We were able to reproduce the error when we ran the backup in the middle of the day.
  
  When the backup completed, it would disconnect the interface and it would not reconnect. Odd that it didn’t do that to other interfaces.
  
  We’re currently investigating if this is happening on our Horizon Clinicals (CareLink) interface as well.
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- January 20, 2011 at 2:13 pm #73131
  Ian Morris
  Participant
  Just wanted to let everyone know that we experienced the exact same Up-but-sporadically-not-receiving-messages issue as the OP. We worked with Support who advised us to change our thread to multi server. Since then, 20 hours ago, we have not had the issue. Probably still too early to say it’s fixed but wanted to share that with everyone.
- January 20, 2011 at 2:22 pm #73132
  Jeff Dinsmore
  Participant
  I tried multi-server as well, but discovered that message transfer was then painfully slow – several seconds per message.
  
  I have not had the opportunity to dig into why.
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- January 20, 2011 at 9:04 pm #73133
  Ian Morris
  Participant
  We also have a theory…Our latest server is a VM Red Hat 5.3 server.
- January 24, 2011 at 2:45 pm #73134
  Rob Abbott
  Keymaster
  Jeff – multiserver should not affect performance like you describe. Note that if you have any procs that generate acknowledgments or other outbound traffic to a multiserver connection you have to populate DRIVERCTL with a CONNID key – something like this:
  
  Code: msgmetaset $ackMh DRIVERCTL [msgmetaget $mh DRIVERCTL]
  
  If you don’t have this logic in place then you wouldn’t be sending an ACK and the other end may be timing out.
  
  Hope this helps.
  
  Rob Abbott
  Cloverleaf Emeritus
- January 26, 2011 at 3:19 pm #73135
  Bevan Richards
  Participant
  I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.
- January 26, 2011 at 9:21 pm #73136
  Ian Morris
  Participant
  ~~Bevan Richards wrote:~~
  
  I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.
  
  Did you try configuring your thread as multiserver?
- December 23, 2013 at 6:22 pm #73137
  Jeff Dinsmore
  Participant
  I’d like to revisit this topic.
  
  Since I originally posted this I’ve gained a better understanding of how Cloverleaf works, and it would appear that this “showing up, but not communicating” state is caused by an abrupt severing of the connection between Cloverleaf and a given connection partner.
  
  Whether the disconnect is caused by the network or the other end of the connection is of no real concern. The primary issue is that Cloverleaf, for whatever reason, doesn’t sense the disconnect.
  
  We primarily see these disconnects on outbound clients – when an outbound queue builds up – so multi-server doesn’t help with that.
  
  We’re currently running CL5.6. Do more recent versions handle these disconnect events better?
  
  Do any of you use other techniques, besides setting an alert to auto-restart or tweaking network protocols, to sense/recover from this type of failure?
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- December 23, 2013 at 7:06 pm #73138
  Robert Milfajt
  Participant
  That just about covers the options right there. The problem is with the OSI model for communication and the fact that the disconnect happens at a lower level (physical, data link, network-IP or transport-TCP) and that Cloverleaf, running at the application layer, is not informed of this. The beauty of this model and keeping things separate, so that you don’t have to write code for TCP, etc., is also one of its problems, i.e., how to detect this.
  
  Hope this helps,
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- February 23, 2015 at 3:59 pm #73139
  mike brown
  Participant
  Hi i opened a INFOR Ticket i have several inbounds that need the multi-server setup.
  
  But can someone provide me the exact steps on how to create this in a new hl7_raw_ack proc.
  
  mike
- February 24, 2015 at 2:52 pm #73140
  Terry Kellum
  Participant
  We are on Redhat 5.3, on ESX 4.1. We have our NICs set to “Flexible”. We don’t have any issues with failover.
- February 25, 2015 at 5:33 pm #73141
  David Coffey
  Participant
  This issue lives in my environment. Clover 5.8.5 running on Windows 2008.
  
  Using Wireshark I have seen the other system send a RST packet which as a request to tear down the connection. This request is never honored. The stack should honor the request pass it up to Clover and the Clover thread should recycle and go into a Listen state.
  
  At this time I do not know if the issue is the Windows stack or Clover.
  
  I believe running in MultiServer is a hack to address a flaw. It also introduces a small security risk, leave an interface with a permanent Listen pending for anyone to connect to and inject something into the interface.
- February 25, 2015 at 5:43 pm #73142
  Terry Kellum
  Participant
  We started on Windows in 2003, and had TCP issues before we went live in 2004. We ditched windows and deployed on Red Hat. You have a different set of things to look at, and a more specialized Sys Admin environment, but I’ve never regretted that decision. It’s important if you deploy on Linux that you make the system tweaks listed in the installation instructions.
- February 25, 2015 at 9:10 pm #73143
  James Cobane
  Participant
  David,
  
  I believe the issue is at a lower level within the OSI than the application layer (where Cloverleaf is). Essentially, Cloverleaf is not being informed of the disconnect from the OS communications, so it believes it is still connected.
  
  I think the multi-server option to address this is more of a “work-around” than a “hack” 😉
  
  Jim Cobane
  
  Henry Ford Health
- February 25, 2015 at 9:13 pm #73144
  David Coffey
  Participant
  Work around or hack. Either way it should not have to be done.
Author

Replies

Viewing 21 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.