Duplicate State 14 Messages

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Duplicate State 14 Messages

  • Creator
    Topic
  • #49864
    Mary Kobis
    Participant

    We just implemented the last of our sites to Version 5.6 (AIX 5.3). We did hit a fixable snag in which two duplicate state 14 messages are in the recovery DB at the same time. Not really worried about this in past versions as there was never any problems to look at the recovery DB for state 14 messages. I’m pretty confident that we have recover_33 in place correctly and I wouldn’t think it is normal to have two duplicate state 14 messages for a thread in the recovery DB? Or am I wrong?

    I found this situation when I tried to cycle a thread that was connected but had multiple pending messages that were not processing. Then engine died when the thread cycled back up. The panic error was:

    PANIC: Thread panic—engine going down

    PANIC: assertion ‘(ptd)->ptd_msg_delivered_ok == ((Message *) 0)’ failed at protocolThread.cpp/825

    I sent the panic off to Support (Thanks Dave) and they confirmed what I did, removed the State 14’s out of the DB and restart. We are doing ok today.

    Thanks in advance,

    Mary.

Viewing 18 reply threads
  • Author
    Replies
    • #63909
      Todd Lundstedt
      Participant

      Aaaah.. the old multi-state-14 problem.  If things are working correctly, there should only be one state 14 message in the recovery DB for a particular thread at a time, and not for very dang long, either.

      We fought that at our site for YEARS… had procedures to check for it prior to process startup and everything.

    • #63910
      Mary Kobis
      Participant

      I’m only seeing this during the day when message throughput is at its highest. It calms down after hours and you don’t see the duplicates. These are outbound threads being fed by an SNA connection from the mainframe. Thanks for the info, Mary… oh.. we are on AIX5.2, not 5.3.

    • #63911
      Tom Rioux
      Participant

      We are having a similar issue with a tcp/ip connection to Ichart.  There are multiple state 14 messages that are lingering around in the RDB.  It doesn’t create a problem unless we have to bounce the process and it panics on restart.  We do have the recovery procs in place and so far this is the only thread that is having this issue.

      Another issue we are having on several threads is that even with the recover procs in place, somehow ACK messages are getting through and are treated as inbound data and not reply.  Yes, we do have await replies on and the recovery procs in place.  Not sure what else to check.  It’s not causing a problem other than filling up the error database.

      Any suggestions for things to check?

    • #63912
      Todd Lundstedt
      Participant

      Mary and Thomas,

      When we upgraded, to 5.5, we had some discussions with a few folks in Quovadx support and they delivered a newer set of SNA procs than we had.  It’s been long enough now that I don’t recall the names of the old procs, but the new set of procs are all contained in one .tcl file, use parms to help identify connections, etc..

      I am reluctant to post the entire proc, because I don’t know if it is a charged-for item or not.  But here are the upper comments from the proc.  If you don’t have this proc set, you might want to contact your rep and/or support to see if you can get them.

      # RTIF – a series of procedures to send to mainframe using RTIF protocol

      #       Normally SMS

      #

      # Required procedures:

      #       StartSMS        Protocol Startup Procedure

      #       checkSMS        Inbound Reply Procedure

      #       resendSMS       Reply Generation Procedure

      #       sendokSMS       Send DATA OK Procedure

      #       writeSMS        Pre-Write Procedure

      #

      #

      # Optional procedures:

      #       deallocSMS      TPS OB Procedure

      The comments go on for a LONG time.

      Good luck!

    • #63913

      Thomas Rioux wrote:

      Another issue we are having on several threads is that even with the recover procs in place, somehow ACK messages are getting through and are treated as inbound data and not reply.  Yes, we do have await replies on and the recovery procs in place.  Not sure what else to check.  It’s not causing a problem other than filling up the error database.

      Any suggestions for things to check?

      I’m seeing this as well.

      -- Max Drown (Infor)

    • #63914
      Jim Kosloskey
      Participant

      Tom and Max,

      The few times I experienced this the issue was with the receiving system.

      In the scenario I experienced, the receiving system was sometimes sending 2 acks in a row. The first ack is treated as expected since the ‘Await reply’ switch was thrown by Cloverleaf(R). The second ack is then treated as ‘DATA’ rather than ‘REPLY’ because when the ‘Await Reply’ switch is thrown and a message is received (the first ack) the switch is thrown off and the first ack message is labelled as “REPLY. That is how Cloverleaf(R) knows this is a reply – that is the ‘Await Reply’ switch is thrown. Now the second ack arrives and the switch is off so any message inbound on this outbound thread is now labelled as ‘DATA’.

      Since the second ack is ‘DATA’, Cloverleaf(R) attempts to route it and I am betting you are getting routing errors – at least that is what I recall seeing.

      Since this does not happen all the time it is difficult to prove to the receiving system what they are doing. SMAT for the inbound messages can assist in troubleshooting.

      Obviously getting the receiving system to fix their problem (if it is their problem) is the best way to address this.

      However, I suspect you could eliminate the errors if they are routing errors by routing all inbound messages on the outbound thread back to the outbound thread and killing them. But that is attempting to cure the symptom not the disease and would not be my preferred way of proceeding.

      Jim Kosloskey

      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #63915
      Michael Hertel
      Participant

      Isn’t checking “outbound only” supposed to treat all incoming data on an outbound thread as reply data?

    • #63916
      Jim Kosloskey
      Participant

      Michael,

      In the scenario I described I had the ‘Outbound only’ box checked.

      It has been a while since I experienced the scenario I described but as I recall the ‘Await Reply’ switch being thrown was the determining factor based on what I saw in the log with the noise level all the way up.

      Of course, there could have been a Cloverleaf(R) bug in the release when I experienced the double acks. The double acks were the receiving system’s problem and when the receiving system corrected all was well.

      I think the key is if it is a routing error one gets on the ack – that is a pretty good indication the ack was treated as ‘DATA’.

      I think that might be shown on the full-length display of the message from the Error DB which includes the metadata.

      Jim Kosloskey

      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #63917
      Rob Abbott
      Keymaster

      Michael Hertel wrote:

      Isn’t checking “outbound only” supposed to treat all incoming data on an outbound thread as reply data?

      Only if the engine is in “await reply” state.  If the engine is not waiting for a reply, any messages coming in will be discarded if “outbound only” is checked.

      Rob Abbott
      Cloverleaf Emeritus

    • #63918
      Jim Kosloskey
      Participant

      Rob,

      Thanks – that jogged my memory.

      The situation I experienced was when there was a bug in Cloverleaf(R) wherein the timing of the “Wait reply’ switch being reset in relationship to and inbound message was such that there was a sufficient window of opportunity that if the receiving system sent back to back acks sometimes the second one got through and was treateed as ‘DATA’.

      I recall now that bug was fixed a long time ago.

      So if the acks are getting errored for routing issues, I would make sure the ‘Outbound only’ box was checked on the thread definition.

      Of course, if it is not checked, you at least can see that the receiving system may have what I would consider a problem if indeed it is sending 2 acks for one message periodically.

      Jim Kosloskey

      email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

    • #63919

      Work around for the duplicate state 14 messages: http://clovertech.infor.com/viewtopic.php?t=2640

      -- Max Drown (Infor)

    • #63920
      Mary Kobis
      Participant

      We have it in place and working… Mary.

    • #63921

      Any thoughts on how to test the fix? In my case, the receiving app has already fixed their problem, so I’d have to find a way to recreate the duplicate state 14 on my own.

      -- Max Drown (Infor)

    • #63922
      Rob Abbott
      Keymaster

      Create a thread with whatever recovery configuration you want to test (built-in, recover_56, recover_33).  Have the thread send a message to a test port (hcitcptest listener, maybe).

      Shut the thread down while it’s waiting for a reply.

      If things are configured correctly, you will have 1 message in state 14 for that thread.  If you’re using recover_33, you will see 2 messages in state 14 for the thread.

      Hope this helps.

      Rob Abbott
      Cloverleaf Emeritus

    • #63923
      Bob Richardson
      Participant

      Greetings,

      Just for planning purposes and future compatibilty: if we deploy the new recover_56 procs to replace our current recover_33 implementation and then apply the future REV1 patch to the CIS5.6 engine, will we need to de-install the recover_56 procs?  It appears that we should not have to do this but am interested in avoiding unncessary work as we have about 150+ outbound threads with recovery_33 now in our existing 5.3 engine.

      [We have planned to move to 5.6 this year].

      Please confirm and thanks!

    • #63924

      Rob Abbott wrote:

      Create a thread with whatever recovery configuration you want to test (built-in, recover_56, recover_33).  Have the thread send a message to a test port (hcitcptest listener, maybe).

      Shut the thread down while it’s waiting for a reply.

      If things are configured correctly, you will have 1 message in state 14 for that thread.  If you’re using recover_33, you will see 2 messages in state 14 for the thread.

      Hope this helps.

      Here’s how I simulated the test.

      01) I created 4 mlp tcp/ip threads, test_send –> [test_in-raw->test_out] –> test_recv

      http://www.planetdrown.com/images/cloverleaf_dup14_01.jpg" />

      02) I configured test_out for Resend OB Data and check_ack from recover_56. Other than check_ack, I didn’t use any other recover_56 proc.

      03) I configured test_recv to not send any ACKs.

      04)  I sent 87 hl7 messages to ob_pre_tps on test_send. Here is a snap shot of the database.

      Code:

      Total messages pending in the site’s Queue: 89

      Messages   Status                      Source                   Target
      =========  ==========================  =======================  =======================
             1  16-PR2229 unbacked queue    test_in                  test_out
             1  14-OB delivered             test_in                  test_out
            87  11-OB post-SMS              test_in                  test_out

      Total messages in the error database: 3

      Messages   Status                                Source                   Target
      =========  ====================================  =======================  =======================
             3  101-Unsupported Trxid                  test_recv

      05) I brought down test_out. There was no change in the database snapshot.

      06) I brought up test_out. As expected, the process did not panic (as there was no duplicate state 14 messages). There was no change in the database snapshot.

      07) I then configured test_out for sendOK_save and resend_ob_date from recover_56 and conducted the same test. Here is the database snapshot.

      Code:

      Messages   Status                      Source                   Target
      =========  ==========================  =======================  =======================
             1  14-OB delivered             test_out                 test_out
            87  11-OB post-SMS              test_in                  test_out

      Total messages in the error database: 2

      Messages   Status                                Source                   Target
      =========  ====================================  =======================  =======================
             2  101-Unsupported Trxid                  test_recv

      08) I observed the same results with recover_56 scripts. No panic. No duplicate state 14 messages.

      Did I conduct the test properly? Is there a better way to do it?

      -- Max Drown (Infor)

    • #63925
      Rob Abbott
      Keymaster

      Robert H Richardson wrote:

      Greetings,

      Just for planning purposes and future compatibilty: if we deploy the new recover_56 procs to replace our current recover_33 implementation and then apply the future REV1 patch to the CIS5.6 engine, will we need to de-install the recover_56 procs?  It appears that we should not have to do this but am interested in avoiding unncessary work as we have about 150+ outbound threads with recovery_33 now in our existing 5.3 engine.

      [We have planned to move to 5.6 this year].

      Please confirm and thanks!

      We will do our very best to make any fix compatible with the workarounds in the technical bulletin (including recover_56).

      But your best option for migration is to change your outbound threads to use the automatic resend feature and remove any sort of recover procs.  This would involve changing thread configuration and any IB RELY “check ack” procedures you have.

      Rob Abbott
      Cloverleaf Emeritus

    • #63926

      Rob Abbott wrote:

      This would involve changing thread configuration and any IB RELY “check ack” procedures you have.

      check_ack is not needed?

      -- Max Drown (Infor)

    • #63927
      Rob Abbott
      Keymaster

      “check_ack” type procedures are only needed if you are validating the reply and want to do things like resend the original OB message based on AR or AE.

      If you are simply killing the reply, you can use hcitpsmsgkill in IB TPS.  No check_ack type procedure necessary.

      This applies to 5.6.  If you are on an earlier release you need a kill procedure that does two things- kill the reply and clean up the saved OB message.

      Rob Abbott
      Cloverleaf Emeritus

Viewing 18 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,296
Replies
34,439
Topic Tags
287
Empty Topic Tags
10