Ack help needed

This topic has 11 replies, 5 voices, and was last updated 9 years, 6 months ago by David Barr.

Creator

Topic
December 16, 2015 at 6:53 pm #54917
Mike Strout
Participant
We have a system that we have been consistently sending results to for a while, but in the past week, it is constantly getting backed up. Restarting the thread clears the backup, but the backup starts again pretty quickly.

I started saving the replies to SMAT and I don’t know what to make of the results. When the thread turns red, I pull up the OB SMAT for the thread and the SMAT for the reply. I searched the reply SMAT for the last message that is backed up, which is easy to identify because there are several copies of it, and could not find the matching reply.

Then I restart the thread, get the reply SMAT again and this time the reply is in there. The time it is written to SMAT coincides with the time I bumped the thread.

My config is pretty straight forward…

On Inbound tab, Outbound Only is checked.

On the Outbound tab, Await replies is checked with a timeout of 60. Timeout handling is Resend. TPS Inbound Reply is check_ack. TrxID is HL7

Any thoughts?
Creator

Topic

Viewing 10 reply threads

Author

Replies
- December 16, 2015 at 8:56 pm #83441
  David Barr
  Participant
  It sounds like the remote side is getting stuck and not sending an ack back. Could it be that the ack they are eventually sending back is because they get another connection and the original message is resent? There could also be a firewall or something interfering with the communication.
  
  Sometimes I run tcpdump to look at exactly what’s being sent and received for these connections. You can use Wireshark to view and analyze the packet capture files.
- December 23, 2015 at 2:54 pm #83442
  Mike Strout
  Participant
  I am still wrestling with this lack of ack issue and haven’t had a chance to dig in with a protocol analyzer yet. As this is an AIX box, things are a bit more complicated than they would be on a Windows box.
  
  I do have a fundamental question about the ack process. My expectation is that the following should happen.
  
  1. Message sent, Cloverleaf puts message into Recovery DB and waits for reply.
  
  2. No ack is received in timeout period so, resend message, restart timer, delete original message from recovery db and wait for reply.
  
  3. Continue in this loop until ack is received or thread is restarted.
  
  4. If ack is received, expire timer, write message to SMAT, and remove message from recovery db
  
  5. If thread is restarted, all timers are expired. Upon restart, all messages in the recovery DB are sent using the same ack processing above.
  
  Assuming this is close to right, what I don’t understand is why I am getting multiple copies of messages written to the OB SMAT. My impression was that only messages that are successfully ack’ed are written to SMAT. If this is correct, then the problem must be in step 2 above where the previous copy of the now resent message is deleted from the recovery db.
  
  I heard a while back that the standard check_ack proc had some issues and that there was a new version that works better with Cloverleaf 6+. Can anyone confirm that and point me to it?
- December 23, 2015 at 3:03 pm #83443
  Jim Kosloskey
  Participant
  Messages are written to the OB SMAT on an outbound hread when successfully sent not when an ack is received.
  
  So if you reach timeout and you are resending another OB SMAT message will appear and so on untill an ack is received.
  
  Unless you have some code to match ack to original sent message ANY reply will be treated as the correct reply to the message sent.
  
  My guess is you are reaching timeout and doing a resend.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 23, 2015 at 3:05 pm #83444
  Mike Strout
  Participant
  Yes, it is pretty clear that I am reaching the timeout. Shouldn’t the previous version of the message be removed from the recovery database as part of the resend process?
- December 23, 2015 at 7:39 pm #83445
  Jim Kosloskey
  Participant
  Well I am not sure how you are configured but if you are controlling everyhting yourself with Tcl rather than using the inherent resend then you should be making a copy and that copy lingers until you get rid of it (probably upon receipt of ack or resend – where the copy is sent and another copy is made).
  
  I suspect the imbedded proceess does the same thing. So the copy will be in the recovery DB until an ack is received. And that is the way you want it I suspect.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 23, 2015 at 8:58 pm #83446
  Mike Strout
  Participant
  Yes, I am using the standard processing and the check_ack script. It is just strange to me that I would be seeing multiple copies of the message sent outbound in the OB SMAT.
- December 23, 2015 at 9:49 pm #83447
  Jim Kosloskey
  Participant
  Let’s say you have exceeded the timeout 5 times and you have resend configured, wouldn’t you want SMAT to reflect exactly how many messages Cloverleaf sent?
  
  I do – then I can easily determine when I am missing the timeout and perhaps either reset the timeout (if the receeiving system just takes longer to respond) or get the receiving system to fix its acknowledgment process.
  
  Many receiving systems cannot even tell they have receeived many attempts at the same message (they only see the one they acknowledged) and thus deny there is an issue. By having all of the resent messages in the OB SMAT I can show the receiving system the number of resends it took and the time interval between messages. Usually that then convinces them to take a closer look.
  
  I find it very helpful when getting initial connectivity settled out with a receiving system.
  
  It is also an accurate representation of the number of messages Cloverleaf has sent to the OB destination giving a more accurate load data.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 24, 2015 at 1:37 am #83448
  Elisha Gould
  Participant
  Assuming the system is in the same network and not over a VPN:
  
  I’ve had this issue with some applications before.
  
  The likely issue is that they are not reading the socket correctly and assuming everything is in one packet rather than split across multiple packets. The result is the message is only half received and they get into a dodgy state. Its likely that if the system is written like this, they don’t log properly either, so it just sits there pretending all is good.
  
  There’s two options for this:
  
  Get them to fix their code so that it handles the reading and erroring correctly.
  
  Write up a proc that closes the connection when the timeout expires.
  
  To do it, check “Use DRIVERCTRL control” in the protocol properties.
  
  In the Timeout Handling Reply Generation add a proc to create a close message and use the PROTO disposition on it, PROTO on the OB message id and KILL the message id.
  
  Code: proc gen_code_ob {args} { keylget args MODE aMode keylget args MSGID aMsgId keylget args CONTEXT aContext keylget args ARGS aArgs global HciConnName if {[string equal $aMode start]} { return {} } if {[string equal $aMode run]} { set myDispList {} echo “$aMode $aMsgId $aContext” switch $aContext { reply_gen { set myMsgId [msgcreate] set myDriverCtl {} keylset myDriverCtl CLOSE 1 keylset myDriverCtl WRITEZERO 0 msgmetaset $myMsgId DRIVERCTL $myDriverCtl lappend myDispList “PROTO $myMsgId” lappend myDispList “KILL $aMsgId” if {[keylget args OBMSGID myObMsgId]} { lappend myDispList “PROTO $myObMsgId” } } } return $myDispList } return {} }
- December 24, 2015 at 1:54 am #83449
  Charlie Bursell
  Participant
  It seems to me your major problem is *NOI* SMAT but why you are re-sending.
  
  First, check your timeout. I have had to set it as high as 300 seconds because of systems that want to do a database post prior to sending an ACK.
  
  Most important, open lines of communication with the vendor on the other side. What is he seeing?
  
  Get a SNIFFER! It will always be a he said they said until you can get definitive proof of what is really happening.
  
  It is easy enough to write a script to remove duplicates from SMAT. As Jim said, some people find them useful
- December 28, 2015 at 8:49 pm #83450
  David Barr
  Participant
  Doing a packet capture on AIX shouldn’t be too difficult. I tried this on Linux, and I think it works the same on AIX.
  
  You need to su or sudo to root from the command line, then run a command like this:
  
  tcpdump -s 0 -w myfilename port XXXX
  
  Replace XXXX with the port number that your interface is using. Let the command start running, then send a message through the interface. After a few seconds you can type ^C to interrupt the tcpdump command.
  
  Copy the file that you captured over to a Windows PC and open it up with Wireshark. You should be able to right click on one of the packets in the display and select “follow TCP stream”. This will open a new window that will show all of the bytes that were sent and received.
- December 28, 2015 at 8:52 pm #83451
  David Barr
  Participant
  Another simple thing to check (if you haven’t done this already) is your communication settings. We usually use pdl-tcpip protocol, and on the protocol properties page we use “mlp_tcp.pdl” as our PDL.
Author

Replies

Viewing 10 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.