PDL: read failed: Unknown error

This topic has 3 replies, 3 voices, and was last updated 4 years, 8 months ago by Brian.

Creator

Topic
October 27, 2020 at 11:46 am #118139
Brian
Participant
I have a Cloverleaf 6.2 running on Windows 2012 server and the other night, the customer reported ADT went down for some time.
The engine logging is turned down, so I don’t get a lot of info, but see something interesting and hoping someone can explain something for me.
Here is a section right after I got the last valid ACK from an ADT connection.

[pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read failed: Unknown error
[pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read returned error 0 (No error)
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
[pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
[pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read failed: Unknown error
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read returned error 0 (No error)
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read failed: Unknown error
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read returned error 0 (No error)
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read failed: Unknown error
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read returned error 0 (No error)
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read failed: Unknown error
[pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read returned error 0 (No error)

In this example, ea_adt_fwd_nwh and cw_ib_nwh are both an outbound thread and connect to two different servers (both configured as protocol type “Client”).
The thread his_ib_nwh is an inbound thread that listens on a port (configured as “Server”) where the EMR connects.
What I find odd, is that I would expect the PDL error on the outbound thread, but not on the inbound thread where it just listens.
The cw_ib_nwh logs this error every 5 seconds, because that is what the reopen time is configured for.
I have to assume that the other 2 threads made the connection right away and that is why I am not seeing those errors repeat.
What I don’t understand is the PDF :ERR. What is the error from?

I tried to recreate on my system using full engine logging. If my thread cannot connect, it retries every 5 seconds, but I never see a PDL :ERR.
Just INFO or DEBUG.

The only time I can get the PDL :ERR is when I shut down the listening port, but notice the “read returned error 0” is PDL :DEBUG, and not ERR like in my log file above.
[pdl :read:DBUG/2: cw_ibdef:10/27/2020 09:58:00] Events: E 32, R 8, W 0
[pdl :PDL :ERR /0: cw_ibdef:10/27/2020 09:58:00] read failed: Unknown error
[pdl :PDL :DBUG/0: cw_ibdef:10/27/2020 09:58:00] read returned error 0 (No error)
[pdl :PDL :INFO/0: cw_ibdef:10/27/2020 09:58:00] Unrecoverable read error: 0, No error

These errors repeat for over a day and even after someone recycled the Cloverleaf process.
It wasn’t until someone restarted the ADT interface on the receiving system that the connection was restored and I haven’t seen these errors since then.

Any explanation on what is causing this would be appreciated.
Creator

Topic

Viewing 2 reply threads

Author

Replies
- October 28, 2020 at 2:56 am #118141
  Charlie Bursell
  Participant
  This why everyone should have a net sniffer. it looks like noise on the line but how to tell?
  
  Put a sniffer on it to see what is really happening.
- October 28, 2020 at 3:23 am #118142
  Ab Lugtenburg
  Participant
  Hi i saw this on both 19.1 and 6.2 cloverleafs
  
  it seems to happen when there is a internal network scan and the port is at that moment not busy or connected. This is all i find out about this. When i restart the thread the problem is solved.
  
  The same error is saw on my 19.1 version when there is a new thread and i started this a first time. It seems to have an issue with the smart sql teh first time so i start the new thread and stop it then start it again ;->>
- October 28, 2020 at 2:22 pm #118144
  Brian
  Participant
  Thanks for the response.
  I was somewhat able to somewhat reproduce this in my test environment using 6.2 and 19.1.
  First I set up a connection to my receiving interface on a remote server using port 5001. Then, on that server I created a Firewall Inbound rule to block port 5001, but left it disabled.
  I connected the CL to my interface and the log file shows a connection.
  While the connection is still up, I turn on the firewall rule and the CL thread goes to opening and I get this in the error log.
  [pti :even:DBUG/1: test_out:10/28/2020 09:34:35] Calling cb 0x2cbdc9c0
  [pdl :read:DBUG/2: test_out:10/28/2020 09:34:35] Events: E 32, R 8, W 0
  [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read failed: Unknown error
  [pdl :PDL :DBUG/0: test_out:10/28/2020 09:34:35] input buffer accepted 0 bytes, now 0
  [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)
  
  My receiving interface shows it’s still up and if I run a netstat on port 5001 on the remote server, the port is established. My CL and interface never reconnect unless I recycle the remote interface. Alerts on the CL are doing no good because the remote interface thinks it’s still connected.
  
  I repeated the same test on my CL 19.1 server and the only difference is the log file for the read returned (ERR vs. DEBUG).
  6.2 = [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)
  19.1 = [pdl :PDL :DEBUG /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)
  
  Now, the only thing that is different than 2 customer sites I’m dealing with (both at v6.2) is that their error repeats every 1 second until the remote interface is recycled.
  On both my 6.2 and 19.1, I see this error 1 time and that’s it.
  
  I’m not that familiar with networking, but is it possible a customer network is running some sort of scan that blocks ports temporarily?
  And in some of my connections, the remote interface automatically recycles and others do not.
Author

Replies

Viewing 2 reply threads

You must be logged in to reply to this topic.