PDL: read failed: Unknown error

Clovertech Forums Cloverleaf PDL: read failed: Unknown error

  • Creator
    Topic
  • #118139
    Brian
    Participant

      I have a Cloverleaf 6.2 running on Windows 2012 server and the other night, the customer reported ADT went down for some time.
      The engine logging is turned down, so I don’t get a lot of info, but see something interesting and hoping someone can explain something for me.
      Here is a section right after I got the last valid ACK from an ADT connection.

      [pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read failed: Unknown error
      [pdl :PDL :ERR /0:ea_adt_fwd_nwh:10/15/2020 20:07:12] read returned error 0 (No error)
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
      [pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read failed: Unknown error
      [pdl :PDL :ERR /0: his_ib_nwh:10/15/2020 20:07:15] read returned error 0 (No error)
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read failed: Unknown error
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:20] read returned error 0 (No error)
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read failed: Unknown error
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:25] read returned error 0 (No error)
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read failed: Unknown error
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:30] read returned error 0 (No error)
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read failed: Unknown error
      [pdl :PDL :ERR /0: cw_ib_nwh:10/15/2020 20:07:35] read returned error 0 (No error)

      In this example, ea_adt_fwd_nwh and cw_ib_nwh are both an outbound thread and connect to two different servers (both configured as protocol type “Client”).
      The thread his_ib_nwh is an inbound thread that listens on a port (configured as “Server”) where the EMR connects.
      What I find odd, is that I would expect the PDL error on the outbound thread, but not on the inbound thread where it just listens.
      The cw_ib_nwh logs this error every 5 seconds, because that is what the reopen time is configured for.
      I have to assume that the other 2 threads made the connection right away and that is why I am not seeing those errors repeat.
      What I don’t understand is the PDF :ERR. What is the error from?

      I tried to recreate on my system using full engine logging. If my thread cannot connect, it retries every 5 seconds, but I never see a PDL :ERR.
      Just INFO or DEBUG.

      The only time I can get the PDL :ERR is when I shut down the listening port, but notice the “read returned error 0” is PDL :DEBUG, and not ERR like in my log file above.
      [pdl :read:DBUG/2: cw_ibdef:10/27/2020 09:58:00] Events: E 32, R 8, W 0
      [pdl :PDL :ERR /0: cw_ibdef:10/27/2020 09:58:00] read failed: Unknown error
      [pdl :PDL :DBUG/0: cw_ibdef:10/27/2020 09:58:00] read returned error 0 (No error)
      [pdl :PDL :INFO/0: cw_ibdef:10/27/2020 09:58:00] Unrecoverable read error: 0, No error

      These errors repeat for over a day and even after someone recycled the Cloverleaf process.
      It wasn’t until someone restarted the ADT interface on the receiving system that the connection was restored and I haven’t seen these errors since then.

      Any explanation on what is causing this would be appreciated.

    Viewing 2 reply threads
    • Author
      Replies
      • #118141
        Charlie Bursell
        Participant

          This why everyone should have a net sniffer.  it looks like noise on the line but how to tell?

          Put a sniffer on it to see what is really happening.

        • #118142
          Ab Lugtenburg
          Participant

            Hi i saw this on both 19.1 and 6.2 cloverleafs

            it seems to happen when there is a internal network scan and the port is at that moment not busy or connected. This is all i find out about this. When i restart the thread the problem is solved.

            The same error is saw on my 19.1 version when there is a new thread and i started this a first time. It seems to have an issue with the smart sql teh first time so i start the new thread and stop it then start it again ;->>

             

          • #118144
            Brian
            Participant

              Thanks for the response.
              I was somewhat able to somewhat reproduce this in my test environment using 6.2 and 19.1.
              First I set up a connection to my receiving interface on a remote server using port 5001. Then, on that server I created a Firewall Inbound rule to block port 5001, but left it disabled.
              I connected the CL to my interface and the log file shows a connection.
              While the connection is still up, I turn on the firewall rule and the CL thread goes to opening and I get this in the error log.
              [pti :even:DBUG/1: test_out:10/28/2020 09:34:35] Calling cb 0x2cbdc9c0
              [pdl :read:DBUG/2: test_out:10/28/2020 09:34:35] Events: E 32, R 8, W 0
              [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read failed: Unknown error
              [pdl :PDL :DBUG/0: test_out:10/28/2020 09:34:35] input buffer accepted 0 bytes, now 0
              [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)

              My receiving interface shows it’s still up and if I run a netstat on port 5001 on the remote server, the port is established. My CL and interface never reconnect unless I recycle the remote interface. Alerts on the CL are doing no good because the remote interface thinks it’s still connected.

              I repeated the same test on my CL 19.1 server and the only difference is the log file for the read returned (ERR vs. DEBUG).
              6.2 = [pdl :PDL :ERR /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)
              19.1 = [pdl :PDL :DEBUG /0: test_out:10/28/2020 09:34:35] read returned error 0 (No error)

              Now, the only thing that is different than 2 customer sites I’m dealing with (both at v6.2) is that their error repeats every 1 second until the remote interface is recycled.
              On both my 6.2 and 19.1, I see this error 1 time and that’s it.

              I’m not that familiar with networking, but is it possible a customer network is running some sort of scan that blocks ports temporarily?
              And in some of my connections, the remote interface automatically recycles and others do not.

          Viewing 2 reply threads
          • You must be logged in to reply to this topic.