PDL: read failed: Unknown error

Homepage Clovertech Forums Cloverleaf PDL: read failed: Unknown error

  • Creator
    Topic
  • #118139
    Brian
    Participant

    I have a Cloverleaf 6.2 running on Windows 2012 server and the other night, the customer reported ADT went down for some time.
    The engine logging is turned down, so I don’t get a lot of info, but see something interesting and hoping someone can explain something for me.
    Here is a section right after I got the last valid ACK from an ADT connection.

    [pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read failed: Unknown error
    [pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read returned error 0 (No error)
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
    [pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
    [pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read failed: Unknown error
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read returned error 0 (No error)
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read failed: Unknown error
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read returned error 0 (No error)
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read failed: Unknown error
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read returned error 0 (No error)
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read failed: Unknown error
    [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read returned error 0 (No error)

    In this example, ea_adt_fwd_nwh and cw_ib_nwh are both an outbound thread and connect to two different servers (both configured as protocol type “Client”).
    The thread his_ib_nwh is an inbound thread that listens on a port (configured as “Server”) where the EMR connects.
    What I find odd, is that I would expect the PDL error on the outbound thread, but not on the inbound thread where it just listens.
    The cw_ib_nwh logs this error every 5 seconds, because that is what the reopen time is configured for.
    I have to assume that the other 2 threads made the connection right away and that is why I am not seeing those errors repeat.
    What I don’t understand is the PDF :ERR. What is the error from?

    I tried to recreate on my system using full engine logging. If my thread cannot connect, it retries every 5 seconds, but I never see a PDL :ERR.
    Just INFO or DEBUG.

    The only time I can get the PDL :ERR is when I shut down the listening port, but notice the “read returned error 0” is PDL :DEBUG, and not ERR like in my log file above.
    [pdl :read:DBUG/2: cw_ibdef:10/27/2020 09:58:00] Events: E 32, R 8, W 0
    [pdl :PDL :ERR /0: cw_ibdef:10/27/2020 09:58:00] read failed: Unknown error
    [pdl :PDL :DBUG/0: cw_ibdef:10/27/2020 09:58:00] read returned error 0 (No error)
    [pdl :PDL :INFO/0: cw_ibdef:10/27/2020 09:58:00] Unrecoverable read error: 0, No error

    These errors repeat for over a day and even after someone recycled the Cloverleaf process.
    It wasn’t until someone restarted the ADT interface on the receiving system that the connection was restored and I haven’t seen these errors since then.

    Any explanation on what is causing this would be appreciated.

Viewing 2 reply threads
  • Author
    Replies
    • #118141
      Charlie Bursell
      Participant

      This why everyone should have a net sniffer.  it looks like noise on the line but how to tell?

      Put a sniffer on it to see what is really happening.

    • #118142
      Ab Lugtenburg
      Participant

      Hi i saw this on both 19.1 and 6.2 cloverleafs

      it seems to happen when there is a internal network scan and the port is at that moment not busy or connected. This is all i find out about this. When i restart the thread the problem is solved.

      The same error is saw on my 19.1 version when there is a new thread and i started this a first time. It seems to have an issue with the smart sql teh first time so i start the new thread and stop it then start it again ;->>

       

    • #118144
      Brian
      Participant

      Thanks for the response.
      I was somewhat able to somewhat reproduce this in my test environment using 6.2 and 19.1.
      First I set up a connection to my receiving interface on a remote server using port 5001. Then, on that server I created a Firewall Inbound rule to block port 5001, but left it disabled.
      I connected the CL to my interface and the log file shows a connection.
      While the connection is still up, I turn on the firewall rule and the CL thread goes to opening and I get this in the error log.
      [pti :even:DBUG/1: test_out:10/28/2020 09:34:35] Calling cb 0x2cbdc9c0
      [pdl :read:DBUG/2: test_out:10/28/2020 09:34:35] Events: E 32, R 8, W 0
      [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read failed: Unknown error
      [pdl :PDL :DBUG/0: test_out:10/28/2020 09:34:35] input buffer accepted 0 bytes, now 0
      [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)

      My receiving interface shows it’s still up and if I run a netstat on port 5001 on the remote server, the port is established. My CL and interface never reconnect unless I recycle the remote interface. Alerts on the CL are doing no good because the remote interface thinks it’s still connected.

      I repeated the same test on my CL 19.1 server and the only difference is the log file for the read returned (ERR vs. DEBUG).
      6.2 = [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)
      19.1 = [pdl :PDL :DEBUG /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)

      Now, the only thing that is different than 2 customer sites I’m dealing with (both at v6.2) is that their error repeats every 1 second until the remote interface is recycled.
      On both my 6.2 and 19.1, I see this error 1 time and that’s it.

      I’m not that familiar with networking, but is it possible a customer network is running some sort of scan that blocks ports temporarily?
      And in some of my connections, the remote interface automatically recycles and others do not.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.

Forum Statistics

Registered Users
5,129
Forums
28
Topics
9,301
Replies
34,448
Topic Tags
288
Empty Topic Tags
10