PDL Error

  • Creator
    Topic
  • #48947
    garry r fisher
    Participant

      Hi,

      We have a very basic pdl (attached – hopefully) and just recently we have started getting the following error.

      [pdl :PDL :ERR /0:CLINI_MLAB_OUT] read failed: Unknown error

      12/12/2006 15:30:19

      [pdl :PDL :ERR /0:CLINI_MLAB_OUT] read returned error 0 (No error)

      12/12/2006 15:30:19

      [pdl :PDL :ERR /0:CLINI_MLAB_OUT] PDL signaled exception: code 0, msg Connection Closed

      The interface this is used on has been live for over 2 years without issue but this problem has occured 4 times now since last Thursday. When it happens data backlogs on the outbound thread but still shows up and every so often it will send a single message – approx 1 every 20 minutes when I was watching last night.

      Two questions:

      1. Can anyone tell me what this means?

      2. Is there a way of enhancing the PDL to give a more meaningful error?

      Any help on this would be much appreciated as the festive season apporaches and there will be 4 days off work.

      Regards

      Garry

    Viewing 5 reply threads
    • Author
      Replies
      • #60206
        Jim Kosloskey
        Participant

          Garry,

          Do you have recover33 (or some other acknowledgment handling) activated on this thread?

          Do you have Await Replies turned on and what is yout timeout?

          In the Status display, do you see long time lags (during the problem period) between message sent and message received?

          Does the connection stay up when this occurs?

          Do you have Auto restart set and if so, how long an interval between retries?

          I think the Pdl error being returned indicates the connection has been severed by the other side.

          Could they have started doing some periodic maintenance on the receiving system (such as backups) which interfere with the communication capability?

          Jim Kosloskey

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

        • #60207
          garry r fisher
          Participant

            Hi Jim,

            Sorry for the delay in getting back to you – took a couple of days leave.

            No recovery or acknowledgment managment on the thread – just hcitpsmsgkill on inbound replies.

            Await replies is set and timeout is 600.

            No delay on send and ack but it looks as though once the error occurs the 600 timeout applies suggesting a problem with message handle once the error occurs.

            Connection shows as up

            Auto restart with default settings is used.

            No comments from third party regarding any maintenance.

            Thanks

            Garry

          • #60208
            Jim Kosloskey
            Participant

              Garry,

              OK this is what I understand:

              Messages are flowing…

              Virtually no delay between sending of message and receipt of reply…

              The thread is configured to time out at 10 minutes (600 seconds) while waiting for a reply and when the timeout does occur, just send the next message (do NOT resend the message for which a reply is pending). The thread is also configured to try to reconnect every 5 seconds if it becomes disconnected.

              Suddenly the error expressed earlier occurs…

              It now appears a reply does not arrive within the 10 minute wait period (or more)…

              With the above in mind,

              Does the log indicate the engine has attempted to reconnect (or do you not have the ‘noise’ level up that high)?

              Doyou have SMAT for both outbound and inbound on the outbound thread? If yes, do you have a way to match up the replies with the messages sent (hopefully a unique Control ID in the MSH or something like that)?

              Can you verify that every message you have sent actually was logged on the receiving system?

              What I am thinking might be happening:

              Messages flowing…

              Conection severed by receiving system…

              Thread goes down…

              After 5 seconds, engine attempts reconnect…

              Reconnection occurs – but – the receiving application is not alive…

              Since waiting for a reply, wait for up to 10 minutes…

              Since receiving application is not alive, no reply for 10 minutes (meanwhile more messages arriving and getting queued)…

              Next message sent (means potentially the previous message is really ‘lost’)…

              Wait another potential 10 minutes for reply (this could be the 20 minutes total observed)…

              Time out occurs and next message is sent…

              and so on until receiving application revives.

              Of course that would mean that you should see the engine attempting to reconnect every 5 seconds after the error has occurred. If the engine noise level is sufficiently set.

              If you happen to be physically monitoring when the error occurs, you should also see the thread temprorarily cycle down and then eventually back up. However you have indicated the thread just stays up.

              I would also expect to see the count of messages out exceed the count of messages in. Again, with the SMAT for both messages and replies and a mechanism for tying the messges together, the pattern could be analyzed further.

              If you want to email me directly, maybe we can get more specific.

              Thanks,

              Jim Kosloskey

              email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

            • #60209
              garry r fisher
              Participant

                Jim,

                Thanks for your update. Let me digest that and I’ll come back to you. A few things to ponder:

                This isn’t HL7 but fixed length records (Old SMS style if you remember those).

                No SMAT

                No ‘noise’

                Although this has happened a number of times the client have stopped and started the connections themselves to get the interface going again so I have only actually seen it the once so far.

                Regards

                Garry

              • #60210
                Jim Kosloskey
                Participant

                  Garry,

                  Ah… an SNA LU6.2 connection?

                  Jim Kosloskey

                  email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

                • #60211
                  garry r fisher
                  Participant

                    Hi,

                    No – TCP-PDL hence the PDL attached above:-)

                    During normal operation the interface is very fast until this error occurs. I can’t see any reconnects following the one error but eo is not enabled. I’m on leave from tomorrow until 2nd January so what I’ll do is on my return enable eo and monitor it.

                    Have a good Christmas and a Happy New Year.

                    Garry

                Viewing 5 reply threads
                • The forum ‘Cloverleaf’ is closed to new topics and replies.