› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Duplicate State 14 Messages
I found this situation when I tried to cycle a thread that was connected but had multiple pending messages that were not processing. Then engine died when the thread cycled back up. The panic error was:
PANIC: Thread panic—engine going down
PANIC: assertion ‘(ptd)->ptd_msg_delivered_ok == ((Message *) 0)’ failed at protocolThread.cpp/825
I sent the panic off to Support (Thanks Dave) and they confirmed what I did, removed the State 14’s out of the DB and restart. We are doing ok today.
Thanks in advance,
Mary.
We fought that at our site for YEARS… had procedures to check for it prior to process startup and everything.
Another issue we are having on several threads is that even with the recover procs in place, somehow ACK messages are getting through and are treated as inbound data and not reply. Yes, we do have await replies on and the recovery procs in place. Not sure what else to check. It’s not causing a problem other than filling up the error database.
Any suggestions for things to check?
When we upgraded, to 5.5, we had some discussions with a few folks in Quovadx support and they delivered a newer set of SNA procs than we had. It’s been long enough now that I don’t recall the names of the old procs, but the new set of procs are all contained in one .tcl file, use parms to help identify connections, etc..
I am reluctant to post the entire proc, because I don’t know if it is a charged-for item or not. But here are the upper comments from the proc. If you don’t have this proc set, you might want to contact your rep and/or support to see if you can get them.
# RTIF – a series of procedures to send to mainframe using RTIF protocol
# Normally SMS
#
# Required procedures:
# StartSMS Protocol Startup Procedure
# checkSMS Inbound Reply Procedure
# resendSMS Reply Generation Procedure
# sendokSMS Send DATA OK Procedure
# writeSMS Pre-Write Procedure
#
#
# Optional procedures:
# deallocSMS TPS OB Procedure
The comments go on for a LONG time.
Good luck!
Another issue we are having on several threads is that even with the recover procs in place, somehow ACK messages are getting through and are treated as inbound data and not reply. Yes, we do have await replies on and the recovery procs in place. Not sure what else to check. It’s not causing a problem other than filling up the error database.
Any suggestions for things to check?
I’m seeing this as well.
-- Max Drown (Infor)
The few times I experienced this the issue was with the receiving system.
In the scenario I experienced, the receiving system was sometimes sending 2 acks in a row. The first ack is treated as expected since the ‘Await reply’ switch was thrown by Cloverleaf(R). The second ack is then treated as ‘DATA’ rather than ‘REPLY’ because when the ‘Await Reply’ switch is thrown and a message is received (the first ack) the switch is thrown off and the first ack message is labelled as “REPLY. That is how Cloverleaf(R) knows this is a reply – that is the ‘Await Reply’ switch is thrown. Now the second ack arrives and the switch is off so any message inbound on this outbound thread is now labelled as ‘DATA’.
Since the second ack is ‘DATA’, Cloverleaf(R) attempts to route it and I am betting you are getting routing errors – at least that is what I recall seeing.
Since this does not happen all the time it is difficult to prove to the receiving system what they are doing. SMAT for the inbound messages can assist in troubleshooting.
Obviously getting the receiving system to fix their problem (if it is their problem) is the best way to address this.
However, I suspect you could eliminate the errors if they are routing errors by routing all inbound messages on the outbound thread back to the outbound thread and killing them. But that is attempting to cure the symptom not the disease and would not be my preferred way of proceeding.
Jim Kosloskey
email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.
In the scenario I described I had the ‘Outbound only’ box checked.
It has been a while since I experienced the scenario I described but as I recall the ‘Await Reply’ switch being thrown was the determining factor based on what I saw in the log with the noise level all the way up.
Of course, there could have been a Cloverleaf(R) bug in the release when I experienced the double acks. The double acks were the receiving system’s problem and when the receiving system corrected all was well.
I think the key is if it is a routing error one gets on the ack – that is a pretty good indication the ack was treated as ‘DATA’.
I think that might be shown on the full-length display of the message from the Error DB which includes the metadata.
Jim Kosloskey
email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.
Isn’t checking “outbound only” supposed to treat all incoming data on an outbound thread as reply data?
Only if the engine is in “await reply” state. If the engine is not waiting for a reply, any messages coming in will be discarded if “outbound only” is checked.
Rob Abbott
Cloverleaf Emeritus
Thanks – that jogged my memory.
The situation I experienced was when there was a bug in Cloverleaf(R) wherein the timing of the “Wait reply’ switch being reset in relationship to and inbound message was such that there was a sufficient window of opportunity that if the receiving system sent back to back acks sometimes the second one got through and was treateed as ‘DATA’.
I recall now that bug was fixed a long time ago.
So if the acks are getting errored for routing issues, I would make sure the ‘Outbound only’ box was checked on the thread definition.
Of course, if it is not checked, you at least can see that the receiving system may have what I would consider a problem if indeed it is sending 2 acks for one message periodically.
Jim Kosloskey
email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.
-- Max Drown (Infor)
-- Max Drown (Infor)
Shut the thread down while it’s waiting for a reply.
If things are configured correctly, you will have 1 message in state 14 for that thread. If you’re using recover_33, you will see 2 messages in state 14 for the thread.
Hope this helps.
Rob Abbott
Cloverleaf Emeritus
Just for planning purposes and future compatibilty: if we deploy the new recover_56 procs to replace our current recover_33 implementation and then apply the future REV1 patch to the CIS5.6 engine, will we need to de-install the recover_56 procs? It appears that we should not have to do this but am interested in avoiding unncessary work as we have about 150+ outbound threads with recovery_33 now in our existing 5.3 engine.
[We have planned to move to 5.6 this year].
Please confirm and thanks!
Create a thread with whatever recovery configuration you want to test (built-in, recover_56, recover_33). Have the thread send a message to a test port (hcitcptest listener, maybe).
Shut the thread down while it’s waiting for a reply.
If things are configured correctly, you will have 1 message in state 14 for that thread. If you’re using recover_33, you will see 2 messages in state 14 for the thread.
Hope this helps.
Here’s how I simulated the test.
01) I created 4 mlp tcp/ip threads, test_send –> [test_in-raw->test_out] –> test_recv
02) I configured test_out for Resend OB Data and check_ack from recover_56. Other than check_ack, I didn’t use any other recover_56 proc.
03) I configured test_recv to not
04) I sent 87 hl7 messages to ob_pre_tps on test_send. Here is a snap shot of the database.
Total messages pending in the site’s Queue: 89
Messages Status Source Target
========= ========================== ======================= =======================
1 16-PR2229 unbacked queue test_in test_out
1 14-OB delivered test_in test_out
87 11-OB post-SMS test_in test_out
Total messages in the error database: 3
Messages Status Source Target
========= ==================================== ======================= =======================
3 101-Unsupported Trxid test_recv
05) I brought down test_out. There was no change in the database snapshot.
06) I brought up test_out. As expected, the process did not panic (as there was no duplicate state 14 messages). There was no change in the database snapshot.
07) I then configured test_out for sendOK_save and resend_ob_date from recover_56 and conducted the same test. Here is the database snapshot.
Messages Status Source Target
========= ========================== ======================= =======================
1 14-OB delivered test_out test_out
87 11-OB post-SMS test_in test_out
Total messages in the error database: 2
Messages Status Source Target
========= ==================================== ======================= =======================
2 101-Unsupported Trxid test_recv
08) I observed the same results with recover_56 scripts. No panic. No duplicate state 14 messages.
Did I conduct the test properly? Is there a better way to do it?
-- Max Drown (Infor)
Greetings,
Just for planning purposes and future compatibilty: if we deploy the new recover_56 procs to replace our current recover_33 implementation and then apply the future REV1 patch to the CIS5.6 engine, will we need to de-install the recover_56 procs? It appears that we should not have to do this but am interested in avoiding unncessary work as we have about 150+ outbound threads with recovery_33 now in our existing 5.3 engine.
[We have planned to move to 5.6 this year].
Please confirm and thanks!
We will do our very best to make any fix compatible with the workarounds in the technical bulletin (including recover_56).
But your best option for migration is to change your outbound threads to use the automatic resend feature and remove any sort of recover procs. This would involve changing thread configuration and any IB RELY “check ack” procedures you have.
Rob Abbott
Cloverleaf Emeritus
This would involve changing thread configuration and any IB RELY “check ack” procedures you have.
check_ack is not needed?
-- Max Drown (Infor)
validating
If you are simply killing the reply, you can use hcitpsmsgkill in IB TPS. No check_ack type procedure necessary.
This applies to 5.6. If you are on an earlier release you need a kill procedure that does two things- kill the reply and clean up the saved OB message.
Rob Abbott
Cloverleaf Emeritus