Repeating MDM Message

This topic has 9 replies, 3 voices, and was last updated 11 years, 4 months ago by Gena Gill.

Creator

Topic
March 31, 2014 at 3:39 pm #54128
Gena Gill
Participant
I have an issue where messages are duplicating within the engine. Here’s the scenario:

I’m getting an MDM message from the sending system. The SMAT file shows that I receive it just one time, but in the process of sending it to the receiving system, before the message leaves the interface engine, the message will replicate itself ad infinitum until I kill all messages in the recovery database, and do an hcimsiutil -zt command on the receiving thread.

The translation is fairly simple, with nothing special about it.

Does anyone have any ideas why this might happen? I’ve been working on this with the theory that there is an issue with the ACK, but I’m not sure that’s the way to go.
Creator

Topic

Viewing 8 reply threads

Author

Replies
- March 31, 2014 at 5:06 pm #80262
  Jim Kosloskey
  Participant
  Gena,’
  
  Are you waiting replies on your outbound thread?
  
  If so are you seeing any acknowledgments for the message being sent (your inbound SMAT on the outbound thread should tell you that or you can turn the EO an look in the log)?
  
  If you are waiting replies, not receiving an acknowledgment, and doing a resend upon timeout expiration that could explain what you are seeing.
  
  With engine noise level up looking at the log and knowing how your reply handling is configred you should be able to determine if it resends.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- March 31, 2014 at 5:14 pm #80263
  Gena Gill
  Participant
  Yes, I was waiting for an Ack, and that’s where I was going with this. I’ve adjusted my wait time downward as it was originally set pretty high, and now I’m waiting for some messages to come through so that I can see what’s going on.
- March 31, 2014 at 5:54 pm #80264
  Jim Kosloskey
  Participant
  Gena,
  
  If they are not replying I think reducing the wait time will only make matters worse.
  
  For example if you now resend after waiting 60 seconds you will get one extra message generated per minute but if you cahnge the timeout to 30 seconds you will get 2 p3r minute – assuming they are not responding for 60+ seconds.
  
  If they are slow responding you may need to increase the wait time, if they are not responding at all you need to get them to respond.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- March 31, 2014 at 7:08 pm #80265
  Gena Gill
  Participant
  I may have found the issue. Here’s the end of the message that keeps repeating: |||||||D
  
  Here’s the end of the same interface in Test, that does not repeat: ||||||R|Dx0d’
  
  I’m going back to the sender to find out why I’m not getting that HEX to end the message in PROD when I got it in TEST.
- April 1, 2014 at 5:40 pm #80266
  Gena Gill
  Participant
  This has become more perplexing. When I look at the message with a HEX editor, the x0d is at the end of the message. However, when I watch the output as the message is sent, the message is not encapsulated like similar messages from other systems. For example, our Radiology system sends an MDM message that is like this ‘MSH……..OBX…|||Dx0d’, but this system is sending the message that looks like this: MSH….OBX…||D in the output. I’m not sure why the single quotes and the x0d are missing.
  
  When I set up the thread, I used the Protocol pdl-tcip like I always do, and then used the mlp_tcp.pdl. Do I need to use a different pdl?
- April 2, 2014 at 12:44 pm #80267
  Tim Pancost
  Participant
  Hey, Gena,
  
  I don’t think that encapsulation is something you need to be concerned about. If the messages did not conform to the MLLP standard, they’d never make it into the engine in the first place, let alone be able to duplicate themselves. Your process log would just be filled with the dreaded, “No match: no more phrases to try” message.
  
  You said the recovery database fills up with the same message repeated “ad infinitum”. What state are these messages in? State 11? State 14? Knowing that will help narrow the search for the culprit.
  
  Also, what is the actual message in the recovery database? The original from the sending system? The translated message? The ack from the receiving system? This too will help narrow the search.
  
  A couple of possibilities immediately jump to my mind. First, that somewhere in the routing/filtering/translating process, you are making a copy of the message(via a msgcopy, but then are not giving it a disposition, so it just hangs out there(leaked, as we say).
  
  The other thing that immediately jumps to mind is that these are acks from the receiving system that aren’t getting disposed correctly in the IB Reply TPS. That’s why it’s important to know exactly what messages you’re seeing piling up in the recovery database.
  
  Just some thoughts that sprang to mind.
  
  HTH,
  
  TIM
  
  Tim Pancost
  
  CHE Trinity Health
  
  Tim Pancost
  Trinity Health
- April 2, 2014 at 6:19 pm #80268
  Gena Gill
  Participant
  Here are the details…
  
  When this happens, in the recovery database, the message is in state 14 sending from the Receiving thread to the Receiving thread. Here’s a screen shot
  
  Created Message Id s e d Prio State Length Source Dest
  
  – – – —-
  
  12:55:16 [0.0.213859943 P D N 8193 14 3063 to_logi_appt to_logi_appt
  
  To get this infinite loop stopped, I have to delete this message, and bounce the to_logi_appt thread. I’ve excluded the translation as the culprit as this happens with the RAW message as well. to_logi_appt has received messages for years with the same configuration without problem. I have the sending thread set up the same way I have the other threads, so I’m not sure why this infinite loop is happening.
  
  This is what I see in the logs from the end of the message to when it starts sending it again:
  
  exercise, weight loss. Follow up in clinic every 3 to 6 months and p.r.n.~~~Skaggs Medical West~~Voice Job ID#: 456053/344360~D: 03/28/2014 18:48:55~T: 03/29/2014 04:43:44~ cl||||||R|Dx0d’
  
  [msg :Msg :INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Updating the recovery database
  
  [dbi :rlog:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Update msg in recovery db to state OB post-SMS
  
  [icl :tcpi:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Message receive complete
  
  [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 0 ready events left.
  
  [pd :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has 1 msgs
  
  [pd :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has SOME work
  
  [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 1 ready events.
  
  [pd :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has 1 msgs
  
  [pd :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has SOME work
  
  [pd :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:03] Processing OB-Data message queue
  
  [pd :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing message to Protocol Driver pdl-tcpip
  
  [pdl :wrte:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Write msg to connection
  
  [pd :pdtd:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing message succeeded
  
  [pd :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] NOW AWAITING A REPLY!
  
  [pd :pdtd:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Latencies: Outbound 0.006, Que 0.010, Total 0.019 secs
  
  [dbi :rlog:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Update msg in recovery db to state OB delivered OK
  
  [pd :pdtd:INFO/2: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing msg to outbound save file to_logi_appt
  
  [pd :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Invoking SENDOK
  
  [pd :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Executing send_data_ok TPS
  
  [tps :tps :INFO/1: to_logi_appt:04/02/2014 08:52:03] tds.string = sendOK_save {MSGID message0} {CONTEXT send_data_ok} {ARGS {}} {MODE run} {VERSION 3.0}
  
  [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 0 ready events left.
  
  [pti :sche:INFO/2:to_logi_oru_url:04/02/2014 08:52:03] Performing apply callback for thread 2
  
  [pti :sche:INFO/2: fr_muse_url:04/02/2014 08:52:03] Performing apply callback for thread 3
  
  [pti :sche:INFO/2: to_logi_appt:04/02/2014 08:52:03] Performing apply callback for thread 4
  
  [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:13] Thread has 1 ready events.
  
  [pd :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:13] [0.0.213810640] Executing reply_gen TPS
  
  [tps :tps :INFO/1: to_logi_appt:04/02/2014 08:52:13] tds.string = resend_ob_msg {MSGID message1} {CONTEXT reply_gen} {ARGS {}} {MODE run} {VERSION 3.0}
  
  [pd :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:13] [0.0.213810637] Placing msg on OB post-SMS DATA queue
  
  [pd :pdtd:INFO/3: to_logi_appt:04/02/2014 08:52:13] [0.0.213810637] OB post-SMS message details
  
  msg: 0x30002e28
  
  msgType : DATA
  
  msgClass : PROTOCOL
  
  msgState : OB delivered OK (14)
  
  msgPriority : 8193
  
  msgRecoveryDbState: 3
  
  msgFlags : 0x8206
  
  msgMid : [0.0.213810637]
  
  msgSrcMid : [0.0.213810634]
  
  msgSrcMidGroup : midNULL
  
  msgOrigSrcThread : fr_escription
  
  msgOrigDestThread : to_logi_appt
  
  msgSrcThread : to_logi_appt
  
  msgDestThread : to_logi_appt
  
  msgXlateThread :
  
  msgSkipXlate : 0
  
  msgSepChars :
  
  msgNumRetries : 0
  
  msgGroupId : 0
  
  msgDriverControl :
  
  msgRecordFormat :
  
  msgRoutes :
  
  msgUserData :
  
  msgStaticIsDirty : 1
  
  msgVariableIsDirty: 0
  
  msgTimeStartIb : 1396446733.782
  
  msgTimeStartOb : 1396446723.775
  
  msgTimeCurQueStart: 0.000
  
  msgTimeTotalQue : 0.010
  
  msgTimeRecovery : 1396446723.773
  
  msgEoConfig : 0x0
  
  msgData (BO) : 0x30002ed4
  
  message : ‘MSH|^~&|DICTAPHONE|COX|HNA|COX|20140331093207||MDM^T02|20140331093207|P|2.3x0dEVN|T02|20140331093207x0dPID||01455
- April 2, 2014 at 6:30 pm #80269
  Tim Pancost
  Participant
  All righty, now we’re getting somewhere. When you said the message was duplicating, I was envisioning a recovery database filled with many copies of the same message.
  
  Based on what you show here, and the process log, it pretty much looks like you’re not getting acknowledgements from the receiving system(to_logi_appt). You can see the timeout in the process log entry, and where it resends using the saved message, looks like you have the timeout set at 10 seconds, based on the timestamps. I should imagine if you looked at the thread statistics, you’d see that the last OB write time updates at the frequency of your reply timeout(every 10 seconds, it appears), and the last IB read time is not updating. This is indicative of the receiver not acking. The only way to resolve that is to have the receiver “fix” whatever’s preventing them from sending the ack back.
  
  HTH,
  
  TIM
  
  Tim Pancost
  Trinity Health
- April 7, 2014 at 7:54 pm #80270
  Gena Gill
  Participant
  I solved the issue… finally.
  
  The problem was with the ACK/NAK and one not being sent back by the receiving system. I finally tracked it down to the sending system not sending a value in MSH 15, which specifies how ACK/NAK are supposed to be handled.
  
  In the system we’re replacing, that system always sent “AL” in that field, to specify that there should always be an ACK or NAK, so I mapped AL to MSH 15 in my translation, and now I have messages flowing correctly between the two systems.
  
  Thanks everyoe, espcially Jim Kosloskey for your help!
Author

Replies

Viewing 8 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.