› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Inbound disconnected, but CL still shows "UP"
I don’t know if this is a Cloverleaf problem, a problem with the sender, or both, but I do know I need to fix it so it doesn’t fail in the middle of the night – I’m not getting enough sleep ;o)
So, my thought is to attack this from two angles.
1) config the inbound connection to be Multi-Server so that the remote can reconnect even though CL thinks its’ still connected.
2) as a fallback, config an alert to bounce the listener if it doesn’t receive any messages for N minutes.
Does that sound reasonable?
Any techniques that might work better?
Thanks!
Jeff Dinsmore
Chesapeake Regional Healthcare
Normally we see this if there’s a firewall or something in between the endpoints that drops or times out the connection without notifying either side.
The Multi-server configuration should fix it up; make sure if you have a reply proc that it’s configured properly for multi-server.
Rob Abbott
Cloverleaf Emeritus
No firewall involved – all devices are on the same LAN.
I’m a relative CL newbie. How should a reply proc be properly configured for Multi-Server?
Jeff Dinsmore
Chesapeake Regional Healthcare
Jeff,
This issue is not necessarily restricted to firewalls. It can be any switch or router on your LAN through which the connection passes. They all have time-out parameters that can cause you grief.
We discovered that this is the result of a daily snapshot backup of our Cloverleaf virtual machine.
Right at the end of the backup, the sending client senses a disconnect and goes into a reconnection mode. Cloverleaf shows the connection as still “up”, and so refuses connection.
The solution for now is to not do the backup, but that’s not a good solution either.
Have any of you seen similar behavior?
Others out there running Cloverleaf on VMWare?
Jeff Dinsmore
Chesapeake Regional Healthcare
Jeff,
One work-around to this might be to define your connection as “Multi-Server” to allow multiple connections. We’ve done this on some threads where the vendor doesn’t always cleanly break and then wants to re-connect.
Hope this helps.
Jim Cobane
Henry Ford Health
We are having a similar issue. Win2003/CL5.3rev3.
We have one lab that we send orders and receive results from, they utilize a virtual ip address system. We have had issues where for one reason or another they flip between the two nic cards that are attached to the virtual address. We are setup with both address on our vpn tunnel, but if there is any disruption in the connection. We lose connectivity and have to manually change from the one NIC card address to the other NIC card address.
Would the multi configuration work with this issue?
We are running 5.7 in a virtual environment and had issues losing connectivity with threads haphazardly.
We pinpointed the problem to our VMware, ‘vmotioning’ the cloverleaf server to another virtual machine. Once we anchored the cloverleaf server to one VM Host the issue went away. Doesn’t help with HA but stablized the environment.
Any other solutions are welcome
It seems that the problem for us was the Avamar software we’re using to back up our VMs. We were able to reproduce the error when we ran the backup in the middle of the day.
When the backup completed, it would disconnect the interface and it would not reconnect. Odd that it didn’t do that to other interfaces.
We’re currently investigating if this is happening on our Horizon Clinicals (CareLink) interface as well.
Jeff Dinsmore
Chesapeake Regional Healthcare
Just wanted to let everyone know that we experienced the exact same Up-but-sporadically-not-receiving-messages issue as the OP. We worked with Support who advised us to change our thread to multi server. Since then, 20 hours ago, we have not had the issue. Probably still too early to say it’s fixed but wanted to share that with everyone.
I tried multi-server as well, but discovered that message transfer was then painfully slow – several seconds per message.
I have not had the opportunity to dig into why.
Jeff Dinsmore
Chesapeake Regional Healthcare
We also have a theory…Our latest server is a VM Red Hat 5.3 server.
Jeff – multiserver should not affect performance like you describe. Note that if you have any procs that generate acknowledgments or other outbound traffic to a multiserver connection you have to populate DRIVERCTL with a CONNID key – something like this:
msgmetaset $ackMh DRIVERCTL [msgmetaget $mh DRIVERCTL]
If you don’t have this logic in place then you wouldn’t be sending an ACK and the other end may be timing out.
Hope this helps.
Rob Abbott
Cloverleaf Emeritus
I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.
I have had the exact same issue happening. We use Win 2008 with CL5.7 rev 2. Latency allerts are not an option because the time between messages varies during the night.
Did you try configuring your thread as multiserver?
I’d like to revisit this topic.
Since I originally posted this I’ve gained a better understanding of how Cloverleaf works, and it would appear that this “showing up, but not communicating” state is caused by an abrupt severing of the connection between Cloverleaf and a given connection partner.
Whether the disconnect is caused by the network or the other end of the connection is of no real concern. The primary issue is that Cloverleaf, for whatever reason, doesn’t sense the disconnect.
We primarily see these disconnects on outbound clients – when an outbound queue builds up – so multi-server doesn’t help with that.
We’re currently running CL5.6. Do more recent versions handle these disconnect events better?
Do any of you use other techniques, besides setting an alert to auto-restart or tweaking network protocols, to sense/recover from this type of failure?
Jeff Dinsmore
Chesapeake Regional Healthcare
That just about covers the options right there. The problem is with the OSI model for communication and the fact that the disconnect happens at a lower level (physical, data link, network-IP or transport-TCP) and that Cloverleaf, running at the application layer, is not informed of this. The beauty of this model and keeping things separate, so that you don’t have to write code for TCP, etc., is also one of its problems, i.e., how to detect this.
Hope this helps,
Robert Milfajt
Northwestern Medicine
Chicago, IL
Hi i opened a INFOR Ticket i have several inbounds that need the multi-server setup.
But can someone provide me the exact steps on how to create this in a new hl7_raw_ack proc.
mike
We are on Redhat 5.3, on ESX 4.1. We have our NICs set to “Flexible”. We don’t have any issues with failover.
This issue lives in my environment. Clover 5.8.5 running on Windows 2008.
Using Wireshark I have seen the other system send a RST packet which as a request to tear down the connection. This request is never honored. The stack should honor the request pass it up to Clover and the Clover thread should recycle and go into a Listen state.
At this time I do not know if the issue is the Windows stack or Clover.
I believe running in MultiServer is a hack to address a flaw. It also introduces a small security risk, leave an interface with a permanent Listen pending for anyone to connect to and inject something into the interface.
We started on Windows in 2003, and had TCP issues before we went live in 2004. We ditched windows and deployed on Red Hat. You have a different set of things to look at, and a more specialized Sys Admin environment, but I’ve never regretted that decision. It’s important if you deploy on Linux that you make the system tweaks listed in the installation instructions.
David,
I believe the issue is at a lower level within the OSI than the application layer (where Cloverleaf is). Essentially, Cloverleaf is not being informed of the disconnect from the OS communications, so it believes it is still connected.
I think the multi-server option to address this is more of a “work-around” than a “hack” 😉
Jim Cobane
Henry Ford Health
Work around or hack. Either way it should not have to be done.