Repeating MDM Message

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Repeating MDM Message

  • Creator
    Topic
  • #54128
    Gena Gill
    Participant

      I have an issue where messages are duplicating within the engine.  Here’s the scenario:

      I’m getting an MDM message from the sending system.  The SMAT file shows that I receive it just one time, but in the process of sending it to the receiving system, before the message leaves the interface engine, the message will replicate itself ad infinitum until I kill all messages in the recovery database, and do an hcimsiutil -zt command on the receiving thread.

      The translation is fairly simple, with nothing special about it.

      Does anyone have any ideas why this might happen?  I’ve been working on this with the theory that there is an issue with the ACK, but I’m not sure that’s the way to go.

    Viewing 8 reply threads
    • Author
      Replies
      • #80262
        Jim Kosloskey
        Participant

          Gena,’

          Are you waiting replies on your outbound thread?

          If so are you seeing any acknowledgments for the message being sent (your inbound SMAT on the outbound thread should tell you that or you can turn the EO an look in the log)?

          If you are waiting replies, not receiving an acknowledgment, and doing a resend upon timeout expiration that could explain what you are seeing.

          With engine noise level up looking at the log and knowing how your reply handling is configred you should be able to determine if it resends.

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

        • #80263
          Gena Gill
          Participant

            Yes, I was waiting for an Ack, and that’s where I was going with this.  I’ve adjusted my wait time downward as it was originally set pretty high, and now I’m waiting for some messages to come through so that I can see what’s going on.

          • #80264
            Jim Kosloskey
            Participant

              Gena,

              If they are not replying I think reducing the wait time will only make matters worse.

              For example if you now resend after waiting 60 seconds you will get one extra message generated per minute but if you cahnge the timeout to 30 seconds you will get 2 p3r minute – assuming they are not responding for 60+ seconds.

              If they are slow responding you may need to increase the wait time, if they are not responding at all you need to get them to respond.

              email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

            • #80265
              Gena Gill
              Participant

                I may have found the issue.  Here’s the end of the message that keeps repeating:  |||||||D

                Here’s the end of the same interface in Test, that does not repeat:  ||||||R|Dx0d’

                I’m going back to the sender to find out why I’m not getting that HEX to end the message in PROD when I got it in TEST.

              • #80266
                Gena Gill
                Participant

                  This has become more perplexing.  When I look at the message with a HEX editor, the x0d is at the end of the message.  However, when I watch the output as the message is sent, the message is not encapsulated like similar messages from other systems.  For example, our Radiology system sends an MDM message that is like this ‘MSH……..OBX…|||Dx0d’, but this system is sending the message that looks like this:  MSH….OBX…||D in the output.  I’m not sure why the single quotes and the x0d are missing.

                  When I set up the thread, I used the Protocol pdl-tcip like I always do, and then used the mlp_tcp.pdl.  Do I need to use a different pdl?

                • #80267
                  Tim Pancost
                  Participant

                    Hey, Gena,

                    I don’t think that encapsulation is something you need to be concerned about.  If the messages did not conform to the MLLP standard, they’d never make it into the engine in the first place, let alone be able to duplicate themselves.  Your process log would just be filled with the dreaded, “No match: no more phrases to try” message.

                    You said the recovery database fills up with the same message repeated “ad infinitum”.  What state are these messages in?  State 11?  State 14?  Knowing that will help narrow the search for the culprit.

                    Also, what is the actual message in the recovery database?  The original from the sending system? The translated message?  The ack from the receiving system?  This too will help narrow the search.

                    A couple of possibilities immediately jump to my mind.  First, that somewhere in the routing/filtering/translating process, you are making a copy of the message(via a msgcopy, but then are not giving it a disposition, so it just hangs out there(leaked, as we say).

                    The other thing that immediately jumps to mind is that these are acks from the receiving system that aren’t getting disposed correctly in the IB Reply TPS.  That’s why it’s important to know exactly what messages you’re seeing piling up in the recovery database.

                    Just some thoughts that sprang to mind.

                    HTH,

                    TIM

                    Tim Pancost

                    CHE Trinity Health

                    Tim Pancost
                    Trinity Health

                  • #80268
                    Gena Gill
                    Participant

                      Here are the details…

                      When this happens, in the recovery database, the message is in state 14 sending from the Receiving thread to the Receiving thread.  Here’s a screen shot

                      Created  Message Id     s e d Prio State Length Source          Dest



                      – – – —-





                      12:55:16 [0.0.213859943 P D N 8193    14   3063 to_logi_appt    to_logi_appt

                      To get this infinite loop stopped, I have to delete this message, and bounce the to_logi_appt thread.  I’ve excluded the translation as the culprit as this happens with the RAW message as well.  to_logi_appt has received messages for years with the same configuration without problem.  I have the sending thread set up the same way I have the other threads, so I’m not sure why this infinite loop is happening.

                      This is what I see in the logs from the end of the message to when it starts sending it again:

                      exercise, weight loss. Follow up in clinic every 3 to 6 months and p.r.n.~~~Skaggs Medical West~~Voice Job ID#:  456053/344360~D: 03/28/2014 18:48:55~T: 03/29/2014 04:43:44~ cl||||||R|Dx0d’

                      [msg :Msg :INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Updating the recovery database

                      [dbi :rlog:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Update msg in recovery db to state OB post-SMS

                      [icl :tcpi:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Message receive complete

                      [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 0 ready events left.

                      [pd  :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has 1 msgs

                      [pd  :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has SOME work

                      [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 1 ready events.

                      [pd  :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has 1 msgs

                      [pd  :thrd:INFO/1: to_logi_appt:04/02/2014 08:52:03] OB-Data queue has SOME work

                      [pd  :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:03] Processing OB-Data message queue

                      [pd  :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing message to Protocol Driver pdl-tcpip

                      [pdl :wrte:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Write msg to connection

                      [pd  :pdtd:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing message succeeded

                      [pd  :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] NOW AWAITING A REPLY!

                      [pd  :pdtd:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Latencies: Outbound 0.006, Que 0.010, Total 0.019 secs

                      [dbi :rlog:INFO/1: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Update msg in recovery db to state OB delivered OK

                      [pd  :pdtd:INFO/2: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Writing msg to outbound save file to_logi_appt

                      [pd  :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Invoking SENDOK

                      [pd  :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:03] [0.0.213810637] Executing send_data_ok TPS

                      [tps :tps :INFO/1: to_logi_appt:04/02/2014 08:52:03] tds.string = sendOK_save {MSGID message0} {CONTEXT send_data_ok} {ARGS {}} {MODE run} {VERSION 3.0}

                      [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:03] Thread has 0 ready events left.

                      [pti :sche:INFO/2:to_logi_oru_url:04/02/2014 08:52:03] Performing apply callback for thread 2

                      [pti :sche:INFO/2:  fr_muse_url:04/02/2014 08:52:03] Performing apply callback for thread 3

                      [pti :sche:INFO/2: to_logi_appt:04/02/2014 08:52:03] Performing apply callback for thread 4

                      [pti :sche:INFO/1: to_logi_appt:04/02/2014 08:52:13] Thread has 1 ready events.

                      [pd  :pdtd:INFO/0: to_logi_appt:04/02/2014 08:52:13] [0.0.213810640] Executing reply_gen TPS

                      [tps :tps :INFO/1: to_logi_appt:04/02/2014 08:52:13] tds.string = resend_ob_msg {MSGID message1} {CONTEXT reply_gen} {ARGS {}} {MODE run} {VERSION 3.0}

                      [pd  :thrd:INFO/0: to_logi_appt:04/02/2014 08:52:13] [0.0.213810637] Placing msg on OB post-SMS DATA queue

                      [pd  :pdtd:INFO/3: to_logi_appt:04/02/2014 08:52:13] [0.0.213810637] OB post-SMS message details

                      msg: 0x30002e28

                         msgType           : DATA

                         msgClass          : PROTOCOL

                         msgState          : OB delivered OK (14)

                         msgPriority       : 8193

                         msgRecoveryDbState: 3

                         msgFlags          : 0x8206

                         msgMid            : [0.0.213810637]

                         msgSrcMid         : [0.0.213810634]

                         msgSrcMidGroup    : midNULL

                         msgOrigSrcThread  : fr_escription

                         msgOrigDestThread : to_logi_appt

                         msgSrcThread      : to_logi_appt

                         msgDestThread     : to_logi_appt

                         msgXlateThread    :

                         msgSkipXlate      : 0

                         msgSepChars       :

                         msgNumRetries     : 0

                         msgGroupId        : 0

                         msgDriverControl  :

                         msgRecordFormat   :

                         msgRoutes         :

                         msgUserData       :

                         msgStaticIsDirty  : 1

                         msgVariableIsDirty: 0

                         msgTimeStartIb    : 1396446733.782

                         msgTimeStartOb    : 1396446723.775

                         msgTimeCurQueStart: 0.000

                         msgTimeTotalQue   : 0.010

                         msgTimeRecovery   : 1396446723.773

                         msgEoConfig       : 0x0

                         msgData (BO)      : 0x30002ed4

                         message           : ‘MSH|^~&|DICTAPHONE|COX|HNA|COX|20140331093207||MDM^T02|20140331093207|P|2.3x0dEVN|T02|20140331093207x0dPID||01455

                    • #80269
                      Tim Pancost
                      Participant

                        All righty, now we’re getting somewhere.  When you said the message was duplicating, I was envisioning a recovery database filled with many copies of the same message.

                        Based on what you show here, and the process log, it pretty much looks like you’re not getting acknowledgements from the receiving system(to_logi_appt).  You can see the timeout in the process log entry, and where it resends using the saved message, looks like you have the timeout set at 10 seconds, based on the timestamps.  I should imagine if you looked at the thread statistics, you’d see that the last OB write time updates at the frequency of your reply timeout(every 10 seconds, it appears), and the last IB read time is not updating.  This is indicative of the receiver not acking.  The only way to resolve that is to have the receiver “fix” whatever’s preventing them from sending the ack back.

                        HTH,

                        TIM

                        Tim Pancost
                        Trinity Health

                      • #80270
                        Gena Gill
                        Participant

                          I solved the issue… finally.

                          The problem was with the ACK/NAK and one not being sent back by the receiving system.  I finally tracked it down to the sending system not sending a value in MSH 15, which specifies how ACK/NAK are supposed to be handled.  

                          In the system we’re replacing, that system always sent “AL” in that field, to specify that there should always be an ACK or NAK, so I mapped AL to MSH 15 in my translation, and now I have messages flowing correctly between the two systems.

                          Thanks everyoe, espcially Jim Kosloskey for your help!

                      Viewing 8 reply threads
                      • The forum ‘Cloverleaf’ is closed to new topics and replies.