Avoid resending message that we got ACK for

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Avoid resending message that we got ACK for

  • Creator
    Topic
  • #54638
    Sergey Sevastyanov
    Participant

      CL 5.7 on Windows 2003R2

      Await replies checked

      Timeout = 60

      on timeout Resend OB message selected

      TPS IB Reply = hcitpsmsgkill

      We occasionally have problem with one of the outbound threads. The thread starts resending the same message over and over again. I can see that we received one ACK for it. It seems that receiving system won’t send another ACK if it already sent one for the same message.

      This happens only on one particular outbound thread and not all the time. Some days are better, some worse.

      When it happens we have to stop the sending thread and delete that message (state 16) from recovery database. After that usually a few messages will go through and then the same thing happens again.

      We are getting desperate and today two of us (interface programmers) were doing all day just that – stopping the thread, deleting message, starting again. No need to say that this caused our recovery database to grow and engine panics.

      I am thinking about writing a procedure that will count number of resends (similar to what it used to be in recover_33) and then check acks file (we save inbound replies in a file) and if the last ACK received is for the message that we keep sending , kill the message.

      The problem is I can’t figure out how to do it.

      Any ideas? Or any other suggestions?

      Thanks,

      Sergey

    Viewing 14 reply threads
    • Author
      Replies
      • #82350
        Jim Kosloskey
        Participant

          Sergey,

          Could it be you need to lengthen the timeout?

          Maybe what happens is you send, wait 60 seconds – timeout happens – then send again and almost immediateely you recieive the ACK for the initial send. Now you are in a deadly embrace.

          Have you tried to lengthen the timeout to something like 120 or higher (we have some set as high as 180 for similar reasons)?

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

        • #82351
          James Cobane
          Participant

            Sergey,

            It would seem that the receiving system is actually checking the ControlID value of the messages being sent and will only ACK it the first time it receives it (although you may have had to resend because you didn’t get an ACK).  You might want to consider simply increasing the timeout to allow for more time between resends to give them more time to ack back.  Or change the the value to -1 and force the thread to wait forever thus insuring you don’t resend unless you bounce the thread.  Otherwise, you will need to code for this situation similar to the recover_33 procs.  

            I guess I would first try to determine the root of the problem; i.e. why are you not getting ACKs within the timeout which is causing the resends.  

            Jim Cobane

            Henry Ford Health

          • #82352
            Sergey Sevastyanov
            Participant

              Jim K and Jim C:

              We actually started with 10 seconds timeout, then increased to 20, then 45 and an hour ago to 60.

              It still happens no matter what. I just increased it to 120 but my hopes are low.

              I guess you are right and setting it to -1 won’t make the matter any worse. The effect will be pretty much the same.

              Thank you

              Sergey

            • #82353
              Jim Kosloskey
              Participant

                Sergey,

                Now that you have it bumped up you should be able to analyze the ib and ob SMAT and determine how long the first ACK takes.

                The first ACK is the important one because that starts the chain of events.

                email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

              • #82354
                Sergey Sevastyanov
                Participant

                  Jim,

                  Last night my interface colleague looked through logs on the receiving system and found that it did sent replies. For some reason though Cloverleaf didn’t receive them – we can’t find them in the logs or error database. Weird.

                  Probably need to put wireshark onto that.

                  Thanks,

                  Sergey

                • #82355
                  Jim Kosloskey
                  Participant

                    Is your Outbound Thread specified as ‘Outbound only’?

                    Does your inbound SMAT on the outbound thread show all of the ACKs or just one?

                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                  • #82356
                    Sergey Sevastyanov
                    Participant

                      No, it doesn’t have Outbound Only flag checked.

                      SMAT shows only one ACK. But transaction log on the receiving system shows multiple ACKs.

                      I checked one particular message on both systems and found some disturbing things. Cloverleaf SMAT shows that we sent that message 3 times. We got one ACK back. When I looked in the transaction log of the receiving system I can see that they received only two messages and sent two ACKs.

                      We decided to table it though because things went hectic yesterday and at some point we deleted by mistake a lot of messages from recovery database, so it’s difficult to track now everything that was happening.

                      Next time it happened we will be looking at transaction logs right away.

                    • #82357
                      Jim Kosloskey
                      Participant

                        ‘Outbound only’ probably should be on unless you are also expecting Data Messages from the receiving system.

                        email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                      • #82358
                        Sergey Sevastyanov
                        Participant

                          Yes, you’re right, Jim! Somehow we missed that.

                          Although I don’t see how that would make difference in this case.

                          We’ll change it during next scheduled downtime.

                        • #82359
                          Jim Kosloskey
                          Participant

                            I think this is what happened:

                            Send message…

                            Wait for reply…

                            Reply (ACK) comes but is not treated as a REPLY but as DATA because ‘Outbound only’ not set…

                            Thread times out waiting for reply (does not matter the timeout length)…

                            Thread resends because of timeout (over and over again until the thread is stopped etc.)

                            I suspect you may have some messages in the Error DB with TrxID errors and those probably are the ACKs becaus I am guessing you don’t have a route for the ACKs (nor should you).

                            email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                          • #82360
                            Sergey Sevastyanov
                            Participant

                              Jim,

                              I thought if we have a Waiting for Reply flag checked the engine switches to waiting mode and treats any message that comes from that thread as a reply at which point it resets the flag and sends next message? Do I understand it all wrong?

                              Anyway, I did check error database – I thought the same that TrixID error would occur. Nothing was there…

                              But the recovery db was not in a good shape – a lot of messages that couldn’t be retrieved because of broken chain (don’t remember exact wording).

                            • #82361
                              Jim Kosloskey
                              Participant

                                Sergey,

                                My understanding is ‘Await Replies’ flips the thread to a listening state (and wait for a something to come back) and ‘Outbound only’ is what causes the next message coming inbound on the outbound thread to be labeled as a REPLY rather than DATA.

                                So both need to be set to do what you want I think.

                                This would allow you to have multiple types of connections:

                                   Send with no expectation of a response but data could be received on the same connection (single thread non-paced asynchronous exchange).

                                   Send with an expectation of only a reply for the message sent using message pacing.

                                   Send with an expectation of both a reply for the message sent (paced) as well as asynchronous inbound data.

                                   etc.

                                email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                              • #82362
                                Charlie Bursell
                                Participant

                                  Wrong Jim.

                                  When Await Repies is set and you send a flag is set to treat any IB messages as a reply.  No more messages can be sent till the flag is cleared.  You can only do that with KILLREPLY

                                  Outbound Only means the engine will not poll the port for IB messages once the AAwit Reply flag is cleared.  It saves a lot of CPU time/

                                  If you were to set Outbound Only on and IB thread it would never receive

                                • #82363
                                  Jim Kosloskey
                                  Participant

                                    Charlie,

                                    Thanks for the correction.

                                    email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                                  • #82364
                                    Sergey Sevastyanov
                                    Participant

                                      Thank you, Charlie, for explanation. I didn’t know it’d save CPU time. I will go through all outbound threads in our configuration and make sure we have Outbound only set up for all. We don’t have any single interface that send and receives on the same thread.

                                      Jim, thank you for your help.

                                  Viewing 14 reply threads
                                  • The forum ‘Cloverleaf’ is closed to new topics and replies.