Messages routed to wrong thread on different site

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Messages routed to wrong thread on different site

  • Creator
    Topic
  • #49655
    Bill Bertera
    Participant

      We encountered something very weird over the weekend with 2 of our Cloverleaf sites. 5.2.1 on Solaris. Here’s a description of the interfaces involved and what happened:

      Site A has processes A1 & A2.

      Process A1 has thread 1 that sends messages to thread 2 in Process A2.

      Site B has processes B1 & B2.

      Process B1 has thread 3 that sends messages to thread 4 in Process B2.

      1.Saturday morning: Process A2 panicked and shut down – log file shows no reason. A2 was not started back up until Monday morning.

      2.For the next 24-30 hours about 80 messages queued up in recoveryDB in process A1 waiting for A2 to come back up.

      3.Sunday afternoon: thread 4 in Process B2 received those 80 messages. Those messages never went through Process B1 or thread 3. The messages were Xlated with the route in process A1, but went through the OBTPS of thread 4.

      4. Those 80 messages stayed in Site A’s recovery DB, until Monday morning when Process A2 was restarted, and A1 cycled.

      Basically, it looks like the messages went through A1’s Xlate, but were sent to the wrong OB Pre-TPS queue – of the interface in a completely different site.

      The log files do not have any “resend” commands, and there was no one working who would have resent with Smat, or dumped from RDB and sent to the other thread.

      The log file of Process B2 shows the messaging being sent out, and their metadata looks like they came directly from Process A1. The source & destination threads are 1 & 2, NOT 3 or 4.

      Process B2 logs this error for each message, I assume because its trying to delete from RDB a message that was never there:

      11/25/2007 17:46:36

      [dbi :dbi :ERR /0:23788_ob_23res] [0.0.27222925] dbiWriteLogMsg: mid doesn’t exist

      11/25/2007 17:46:36

      [dbi :dbi :ERR /0:23788_ob_23res] [0.0.27222925] dbiWriteLogMsg: mid doesn’t exist

      11/25/2007 17:46:36

      [dbi :dbi :WARN/0:23788_ob_23res] [0.0.27222925] Requested to delete non-existent mid

      Here’s where it gets interesting, the message ID is from the mid number wheel of Site B, but the OriginalMID is from the number wheel range of Site A.

      Has anyone ever seen anything like this? All signs point to the ICL thread as the culprit, but there’s no real way to diagnose that. Any other suggestions of where to look?

      EDIT: discovered the panic on Saturday was caused by a different thread in the process – unrelated to any of these routes or threads.

    Viewing 6 reply threads
    • Author
      Replies
      • #62887
        Tom Rioux
        Participant

          This may be way too simple, but it sounds like you have the same port number on all the threads.  If that is the case and you have it set to localhost, then the scenario you mentioned sounds like something that can happen under those circumstances.   Can we get some more information about your set up?

          Thanks…

          Tom Rioux

        • #62888
          Bill Bertera
          Participant

            Thomas Rioux wrote:

            This may be way too simple, but it sounds like you have the same port number on all the threads.  If that is the case and you have it set to localhost, then the scenario you mentioned sounds like something that can happen under those circumstances.   Can we get some more information about your set up?

            Thanks…

            Tom Rioux

            It wouldn’t be a TCP confusion, that would have shown up in the smat files.

            Thread 1 sends the thread 2 through a route, and the same for threads 3 & 4.

          • #62889
            Terence Gucwa
            Participant

              Did anyone find an answer to this?  We’ve had the same problem – messages getting routed to outbound threads in the wrong site.  But this is in Cloverleaf 5.7.2, AIX 5.3.  The messages sit in the wrong recovery database because the outbound threads don’t exist in that site.

            • #62890
              Bob Richardson
              Participant

                Greetings,

                If I read your post correctly you are running an old Cloverleaf version 5.2.1?  I seem to recall a behavioral trait of the Integrator that if threads

                have names greater than 15 characters and the first 15 characters are the same then the engine can route messages to the wrong thread or not at all – they get lost in a bit bucket.  There will be no process log entries to indicate that problem.

                Whether or not this remains true with the latest Integrator (we are running 5.8.6.0 on AIX 6.1 TL7SP4) is uncertain.  We just make sure

                that long thread names are unique for the first 15 characters.

                Otherwise I would suggest logging a support case to INFOR.

                But be prepared:  they may just tell you to upgrade first and then see if

                the problem continues.

              • #62891
                Terence Gucwa
                Participant

                  Regarding my post, we’re on 5.7.  Yes, we have a couple of thread names longer than 15 chars, but they don’t seem involved in this issue, and are unique with the first 15.  Still, thank you Bob, for pointing that out and that’s really something we should get rid of.  

                  I have talked to Infor and yes, they would want us to upgrade before they would pursue this as a bug.  I’d be happy though, just to know how to avoid the problem, rather than having a fix.  It’s pretty rare, happens only in the wee hours of Sunday mornings, but when it happens, it’s a mess.

                • #62892
                  Bob Richardson
                  Participant

                    Greetings,

                    Is your equivalent of a Security department going TCP port probes?

                    I seem to recall that the 5.7 engine can panic/crash when that happens.

                    There may be a 5.7 patch (Revision 3) that fixes that problem.

                    (If memory serves me here).

                    You could see if it is still available; review the Release notes about

                    that problem and apply the patch.  Avoids a major upgrade for now.

                    Good hunting!

                  • #62893
                    Bob Richardson
                    Participant

                      Errata:  read “doing” for “going”.  BobR

                  Viewing 6 reply threads
                  • The forum ‘Cloverleaf’ is closed to new topics and replies.