Losing messages – state 14 MIA

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Losing messages – state 14 MIA

  • Creator
    Topic
  • #53211
    Richard Hart
    Participant

      AIX 5.3

      CLOVERLEAF(R) Integration Services 5.8.5.3P

      The details are below, has anyone seen this before?

      Firstly, there may be issues we have that very few other sites have!

      Our production server is running at 100% peak with a load average on occasion of over 20 (not our choice!)

      When this issue happened, the server was heavily loaded and both the Cloverleaf thread and the receiving connection were being repeatedly bounced.

      I have executed the unit test script with various modifications  for our resend code and have been unable to re-create the issue.

      We have Cloverleaf production sites running Cloverleaf versions 5.4, 5.6 and 5.8 and use the same ‘resend’ code in all versions and the code has been used for many years.

      In the ‘resend code’, when a message has been sent, no ACK has been received and the Cloverleaf thread stopped and restarted, we ‘trap’ in the ‘reply_gen’ context and log

          State 14 (Outbound delivered OK) message exists. Resending message …

      then log context ‘send_data_ok’ for the message delivery

          SEND CTRLID [004]

      In the production logs,  the message still exists on thread shutdown as expected

      i.e.

         [cmd :cmd :INFO/0:snk_prod_ih_cmd:07/16/2012 12:00:36] Doing ‘pstop’ command on thread ‘pas_prod_ih_adt_snd’

         12:00:36 pas_prod_ih_adt_snd gen_code_resend:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_msginfo:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_resend:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_resend:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_resend:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_printmsg:INFO : Thread is shutting down

         12:00:36 pas_prod_ih_adt_snd gen_code_resend:INFO : Thread is shutting down

         [prod:prod:INFO/0:pas_prod_ih_adt_snd:07/16/2012 12:00:36] Checking for leaked handles in the General interpreter…

         

         [prod:prod:INFO/0:pas_prod_ih_adt_snd:07/16/2012 12:00:36] Checking for leaked handles in the TPS interpreter…

         Handle     Allocated by

         ======     ============

         message0  

      WARNING: Message [0.0.647402] is in the RDB and was left bound into Tcl

      But on thread start up, the ‘state 14’ is not trapped.

         [cmd :cmd :INFO/0:snk_prod_ih_cmd:07/16/2012 12:00:37] Doing ‘pstart’ command on ‘pas_prod_ih_adt_snd’

         [prod:prod:INFO/0:pas_prod_ih_adt_snd:07/16/2012 12:00:37] Starting protocol thread pas_prod_ih_adt_snd as tid 4.

         [prod:prod:INFO/0:pas_prod_ih_adt_snd:07/16/2012 12:00:42] Applying EO config: ”

         12:00:45 pas_prod_ih_adt_snd gen_code_resend:INFO : RCS Info $Id: gen_code_resend.tcl,v 1.27 2012/05/10 09:05:29 he00387 Exp $

         …

         12:00:46 pas_prod_ih_adt_snd gen_code_resend:INFO : SEND CTRLID [PJ@OS0703251287]

      Compared to the unit test on thread startup …

         [cmd :cmd :INFO/0:he00387_snd_cmd:07/23/2012 13:37:02] Doing ‘pstart’ command on ‘gen_code_ai_xxx_snd’

         [prod:prod:INFO/0:gen_code_ai_xxx_snd:07/23/2012 13:37:03] Applying EO config: ”

         13:37:03 gen_code_ai_xxx_snd gen_code_resend:INFO : RCS Info $Id: gen_code_resend.tcl,v 1.27 2012/05/10 09:05:29 he00387 Exp $

         …

         13:37:03 gen_code_ai_xxx_snd gen_code_resend:INFO : State 14 (Outbound delivered OK) message exists. Resending message …

         13:37:03 gen_code_ai_xxx_snd gen_code_resend:INFO : SEND CTRLID [004]

    Viewing 3 reply threads
    • Author
      Replies
      • #76934
        Charlie Bursell
        Participant

          We changed the way replies are handled in 5.6  If you are using the same procs in 5.4, 5.6 and 5.8 you have a problem.

          If you do not want to use the new built in mechanisms for resend you still have to handle the State 16 message which is in OBMSGID.

          Take a look at recover.tcl in $HCIROOT/tclprocs

        • #76935
          Richard Hart
          Participant

            Hi Charlie.

            Thanks for the update.

            This issue in this case appears to be that a state 14 is detected in a normally loaded environment, but not in a high loaded environment and as a result, with the many bounces of connections, message appear to have been ‘lost’ – they were sent, but the log files don’t indicate an ACK back before the connection was bounced again and the next message in the queue ‘sent’.

            We’ve had no issue with this in the past year with 5.8!

            With respect to the code and various revisions …

            I noted the OBMSGID in 5.6 and perhaps it’s because we have custom code in the various contexts we don’t see the OBMSGID!

            I’ve added an echo to our code (below) which uses the various contexts and ran the unit test code in Cl 5.8 which creates tagged log files for the various scenarios tested

            proc gen_code_resend {args} {

               echo “gen_code_resend: $args”

            A log file line is

             he00387_snd.a-1A:gen_code_resend: {MSGID message0} {CONTEXT sms_ob_data} {ARGS {}} {MODE run} {VERSION 3.0}

            When I search through the logs for the ‘echo’ output and strip off the file name, I only get a few different lines and none supply the OBMSGID

            i.e.

            grep ^gen_code he00387_snd* | cut “-d:” -f2,3,4 | sort -u

             gen_code_resend: {CONTEXT prewrite} {ARGS {}} {MODE shutdown} {VERSION 3.0}

             gen_code_resend: {CONTEXT prewrite} {ARGS {}} {MODE start} {VERSION 3.0}

             gen_code_resend: {CONTEXT reply_gen} {ARGS {}} {MODE shutdown} {VERSION 3.0}

             gen_code_resend: {CONTEXT reply_gen} {ARGS {}} {MODE start} {VERSION 3.0}

             gen_code_resend: {CONTEXT send_data_ok} {ARGS {}} {MODE shutdown} {VERSION 3.0}

             gen_code_resend: {CONTEXT send_data_ok} {ARGS {}} {MODE start} {VERSION 3.0}

             gen_code_resend: {CONTEXT sms_ib_reply} {ARGS {}} {MODE shutdown} {VERSION 3.0}

             gen_code_resend: {CONTEXT sms_ib_reply} {ARGS {}} {MODE start} {VERSION 3.0}

             gen_code_resend: {CONTEXT sms_ob_data} {ARGS {}} {MODE shutdown} {VERSION 3.0}

             gen_code_resend: {CONTEXT sms_ob_data} {ARGS {}} {MODE start} {VERSION 3.0}

             gen_code_resend: {MSGID message0} {CONTEXT prewrite} {ARGS {}} {MODE run} {VERSION 3.0}

             gen_code_resend: {MSGID message0} {CONTEXT reply_gen} {ARGS {}} {MODE run} {VERSION 3.0}

             gen_code_resend: {MSGID message0} {CONTEXT send_data_ok} {ARGS {}} {MODE run} {VERSION 3.0}

             gen_code_resend: {MSGID message0} {CONTEXT sms_ib_reply} {ARGS {}} {MODE run} {VERSION 3.0}

             gen_code_resend: {MSGID message0} {CONTEXT sms_ob_data} {ARGS {}} {MODE run} {VERSION 3.0}

             gen_code_resend: {MSGID message1} {CONTEXT sms_ib_reply} {ARGS {}} {MODE run} {VERSION 3.0}

          • #76936
            Richard Hart
            Participant

              As an FYI.

              Our resend code verifies that the HL7 ACK contains the Message Control Id from the sent message. We also have switches that kill a message after a few send attempts based on the ACK being AE or AR.

              The unit

            • #76937
              Charlie Bursell
              Participant

                If you feel there is problem contact Support and they will send to R&D

            Viewing 3 reply threads
            • The forum ‘Cloverleaf’ is closed to new topics and replies.