PDL Error

This topic has 6 replies, 2 voices, and was last updated 18 years, 6 months ago by garry r fisher.

Creator

Topic
December 13, 2006 at 8:31 am #48947
garry r fisher
Participant
Hi,

We have a very basic pdl (attached – hopefully) and just recently we have started getting the following error.

[pdl :PDL :ERR /0:CLINI_MLAB_OUT] read failed: Unknown error

12/12/2006 15:30:19

[pdl :PDL :ERR /0:CLINI_MLAB_OUT] read returned error 0 (No error)

12/12/2006 15:30:19

[pdl :PDL :ERR /0:CLINI_MLAB_OUT] PDL signaled exception: code 0, msg Connection Closed

The interface this is used on has been live for over 2 years without issue but this problem has occured 4 times now since last Thursday. When it happens data backlogs on the outbound thread but still shows up and every so often it will send a single message – approx 1 every 20 minutes when I was watching last night.

Two questions:

1. Can anyone tell me what this means?

2. Is there a way of enhancing the PDL to give a more meaningful error?

Any help on this would be much appreciated as the festive season apporaches and there will be 4 days off work.

Regards

Garry
Creator

Topic

Viewing 5 reply threads

Author

Replies
- December 13, 2006 at 2:29 pm #60206
  Jim Kosloskey
  Participant
  Garry,
  
  Do you have recover33 (or some other acknowledgment handling) activated on this thread?
  
  Do you have Await Replies turned on and what is yout timeout?
  
  In the Status display, do you see long time lags (during the problem period) between message sent and message received?
  
  Does the connection stay up when this occurs?
  
  Do you have Auto restart set and if so, how long an interval between retries?
  
  I think the Pdl error being returned indicates the connection has been severed by the other side.
  
  Could they have started doing some periodic maintenance on the receiving system (such as backups) which interfere with the communication capability?
  
  Jim Kosloskey
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 18, 2006 at 12:47 pm #60207
  garry r fisher
  Participant
  Hi Jim,
  
  Sorry for the delay in getting back to you – took a couple of days leave.
  
  No recovery or acknowledgment managment on the thread – just hcitpsmsgkill on inbound replies.
  
  Await replies is set and timeout is 600.
  
  No delay on send and ack but it looks as though once the error occurs the 600 timeout applies suggesting a problem with message handle once the error occurs.
  
  Connection shows as up
  
  Auto restart with default settings is used.
  
  No comments from third party regarding any maintenance.
  
  Thanks
  
  Garry
- December 18, 2006 at 3:56 pm #60208
  Jim Kosloskey
  Participant
  Garry,
  
  OK this is what I understand:
  
  Messages are flowing…
  
  Virtually no delay between sending of message and receipt of reply…
  
  The thread is configured to time out at 10 minutes (600 seconds) while waiting for a reply and when the timeout does occur, just send the next message (do NOT resend the message for which a reply is pending). The thread is also configured to try to reconnect every 5 seconds if it becomes disconnected.
  
  Suddenly the error expressed earlier occurs…
  
  It now appears a reply does not arrive within the 10 minute wait period (or more)…
  
  With the above in mind,
  
  Does the log indicate the engine has attempted to reconnect (or do you not have the ‘noise’ level up that high)?
  
  Doyou have SMAT for both outbound and inbound on the outbound thread? If yes, do you have a way to match up the replies with the messages sent (hopefully a unique Control ID in the MSH or something like that)?
  
  Can you verify that every message you have sent actually was logged on the receiving system?
  
  What I am thinking might be happening:
  
  Messages flowing…
  
  Conection severed by receiving system…
  
  Thread goes down…
  
  After 5 seconds, engine attempts reconnect…
  
  Reconnection occurs – but – the receiving application is not alive…
  
  Since waiting for a reply, wait for up to 10 minutes…
  
  Since receiving application is not alive, no reply for 10 minutes (meanwhile more messages arriving and getting queued)…
  
  Next message sent (means potentially the previous message is really ‘lost’)…
  
  Wait another potential 10 minutes for reply (this could be the 20 minutes total observed)…
  
  Time out occurs and next message is sent…
  
  and so on until receiving application revives.
  
  Of course that would mean that you should see the engine attempting to reconnect every 5 seconds after the error has occurred. If the engine noise level is sufficiently set.
  
  If you happen to be physically monitoring when the error occurs, you should also see the thread temprorarily cycle down and then eventually back up. However you have indicated the thread just stays up.
  
  I would also expect to see the count of messages out exceed the count of messages in. Again, with the SMAT for both messages and replies and a mechanism for tying the messges together, the pattern could be analyzed further.
  
  If you want to email me directly, maybe we can get more specific.
  
  Thanks,
  
  Jim Kosloskey
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 19, 2006 at 8:38 am #60209
  garry r fisher
  Participant
  Jim,
  
  Thanks for your update. Let me digest that and I’ll come back to you. A few things to ponder:
  
  This isn’t HL7 but fixed length records (Old SMS style if you remember those).
  
  No SMAT
  
  No ‘noise’
  
  Although this has happened a number of times the client have stopped and started the connections themselves to get the interface going again so I have only actually seen it the once so far.
  
  Regards
  
  Garry
- December 19, 2006 at 1:59 pm #60210
  Jim Kosloskey
  Participant
  Garry,
  
  Ah… an SNA LU6.2 connection?
  
  Jim Kosloskey
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- December 21, 2006 at 8:21 am #60211
  garry r fisher
  Participant
  Hi,
  
  No – TCP-PDL hence the PDL attached above:-)
  
  During normal operation the interface is very fast until this error occurs. I can’t see any reconnects following the one error but eo is not enabled. I’m on leave from tomorrow until 2nd January so what I’ll do is on my return enable eo and monitor it.
  
  Have a good Christmas and a Happy New Year.
  
  Garry
Author

Replies

Viewing 5 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.