missing messages on interface

Clovertech Forums Cloverleaf missing messages on interface

  • Creator
    Topic
  • #110509
    David Barr
    Participant

      We have a receiving interface that frequently drops its connection to Cloverleaf. Sometimes when this happens, we’ve sent a message but we haven’t received an ACK for the message. We have a reply timeout of 60 seconds. It appears that when we reconnect to the system, Cloverleaf is still waiting for the ACK and doesn’t automatically resend the message. Is there a way to get Cloverleaf to automatically resend the last message immediately when it connects if the message hasn’t been acknowledged?

      (edit) I should clarify that the reason I titled this post “missing messages” is that the receiving system appears to have implemented a workaround on their own of sending an “AA” ACK on startup to avoid the 60 second wait for a message, and then they complain to us that they never received the message.

      • This topic was modified 5 years, 5 months ago by David Barr.
      • This topic was modified 5 years, 5 months ago by David Barr.
    Viewing 9 reply threads
    • Author
      Replies
      • #110513
        Jim Kosloskey
        Participant

          David,

          I want to make sure I understand the sequence.

          You send a message waiting 60 seconds for timeout (what do you normally do on timeout?)..

          The receiving system stops the connection. Is the message still in the Recovery DB?

          The receiving system restarts their connection and sends an Ack message. Do you see the Ack message at all?

          What release of Cloverleaf?

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

        • #110521
          David Barr
          Participant

            Yes, our inbound reply timeout is 60 seconds. After a timeout we resend the OB message.

            I think that the receiving system is responsible for the connection dropping. We get a “read returned error 0” error in the log. I don’t know what the recovery database looks like when this happens. It happens maybe 15 times a day, and I don’t see the problem quickly enough to check the recovery database.

            After we reconnect, the receiving system sends an ACK (after a 30 second timeout on their end). The ACK shows up in our SMAT file, and we immediately send the following message. Their ACK is a preemptive ACK that’s not in response to a message, and it has a blank MSA-2 (control ID) value, so it’s easy for us to tell which of them are fake and which ones are real ACKs.

            We’re on Cloverleaf 6.2.1.0 on Linux.

          • #110522
            Jim Kosloskey
            Participant

              I believe you are correct that it is the receiving system which is doing the disconnect.

              I am assuming you have the thread setup as Client and auto reconnect.

              I suspect the message sent is received at the TCP/IP level but the receiving system has not processed the message when they disconnect which loses the message. Cloverleaf thinks it is sent but the receiving system does not think it got it.

              So the handle assigned to the message in Cloverleaf is removed (Cloverleaf thinks it has been delivered). Depending on how the thread is configured there could be a copy of the handle in case a resend is needed.

              Are you using the default timeout (resend OB Message) or a Tcl proc for timeout?

              Are you using your own proc for acknowledgment handling or using the Cloverleaf default?

              Do you know if the contrived ACK is received as an IB message or is it treated as a reply?

              Typically when they reconnect is there a next message ready to be sent and the contrived ACK is treated as an acknowledgment for that message?

              email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

            • #110563
              David Barr
              Participant

                I’m using the default timeout, nothing custom. I’m using check_ack from recover_56.tcl. I’m pretty sure that the message is a reply. I have “outbound only” set on the thread, and the message is logged in SMAT. I think that Cloverleaf is treating the fake ACK as an ack for the message sent before the disconnect. We don’t send another message after reconnect until the fake ack is received.

              • #110573
                Jim Kosloskey
                Participant

                  Here is my take based on what has been discussed (correct me wherever I am incorrect):

                  Message is sent by Cloverleaf and is received by the Receiving system at the TCP/IP level (Cloverleaf considers that message delivered).

                  Cloverleaf waits for an acknowledgment (60 sec timeout).

                  The Receiving System drops the connection BEFORE the timeout expires.

                  Cloverleaf continues to try to connect.

                  The receiving system reconnects and sends a bogus ACK. But it is treated as a positive ACK so previous message is not resent.

                  Next message from Cloverleaf is sent.

                   

                  I think what needs to be done is to modify the acknowledgment handling proc to identify this bogus ACK (you indicated that could be done) and treat it as a negative ACK which should cause the saved handle of the previous message to be sent.

                   

                   

                  email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                • #110574
                  Jim Kosloskey
                  Participant

                    Oh and just because the Bogus ACK is in SMAT does not mean it was treated as a Reply by Cloverleaf.

                    With the setup you have, I think an unsolicited inbound message on that outbound thread would be treated as Data. That should be reflected in the SMAT message Metadata I think.

                    However, I doubt you have any routing for any IB messages on the OB thread so the Bogus ACKs would likely end up in the Error DB with TrxId errors.

                    Thus it is most likely the Bogus ACK is indeed being treated as a Reply by Cloverleaf and satisfying the Wait Replies thus triggering your reply handling proc.

                     

                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                  • #110577
                    David Barr
                    Participant

                      Yeah, so far I’ve tried changing my timeout from 60 seconds to 25 seconds to try to make sure we resend before we get a fake ACK, but that didn’t work. I’m running some packet captures now to try to get more details why it didn’t work.

                      Your suggestion of treating the fake ack as a failure is one other option I’ve been considering. The vendor is also willing to change the code in the fake ACK from AA to AR, and that would probably also fix it.

                      The main reason I asked for advice here is because this is the first time I’ve seen a Cloverleaf thread reconnect and still be waiting for a message. That doesn’t really seems like correct behavior to me. Is there a way to have a reconnect cancel the message timeout and automatically resend the prior message immediately?

                    • #110583
                      Jim Kosloskey
                      Participant

                        Not that I am aware of.

                        Generally everything is constructed with the understanding systems work properly which it appears is not the case with this receiving system.

                        If the vendor is willing to work some more on this issue, perhaps the real solution is to find out why the receiving system is disconnecting after receiving a message and solve that.

                        Well, if a normal system was in play, you would want to wait through a disconnect for the proper acknowledgment I think. If you are waiting for a reply, a disconnect happens before the reply or timeout is reached, I would think a properly written receiving system would be prepared to send the appropriate acknowledgment for the message received prior to the disconnect as soon as reconnection occurred relying on the sending system to be still waiting for the acknowledgment.

                        At least, when I wrote Integrations decades ago at the protocol and application level that was my expectation.

                        Just my .02.

                        email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                      • #113202
                        Jeff Dinsmore
                        Participant

                          Jim – You’re saying, if Cloverleaf sends a message and is still in the process of waiting for an ACK when the socket is disconnected, that it continues to wait for that ACK when it reconnects?

                          My understanding has always been that communications are reset by a disconnect and that the un-ACK’d message it was trying to send is resent when the connection is reestablished.

                          If it’s true that it waits through a disconnect/reconnect, we’d never be able to get past a message that Cloverleaf sent to a system that was out-to-lunch.  It’s possible at the network level that the message is delivered, but the receiving application doesn’t process it at all.  So, when that app restarts, it knows nothing of the message that CL is waiting on an ACK for.

                          Perhaps, I’m missing something…

                           

                           

                          Jeff Dinsmore
                          Chesapeake Regional Healthcare

                        • #113216
                          Charlie Bursell
                          Participant

                            Make sure you have OB Only checked.  That way anu message that comes in when not waiting for an ACK will be ignored.

                        Viewing 9 reply threads
                        • You must be logged in to reply to this topic.