read returned error 78 – firewall sends only first packet

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf read returned error 78 – firewall sends only first packet

  • Creator
    Topic
  • #55483
    Peter Heggie
    Participant

      Our network administrator can see an external ancillary system sending us two packets, but only the first packet is delivered.

      In the Cloverleaf log with debugging turned up, I can see the first message does not have the stop characters that the PDL is looking for. The protocol times out waiting for the last packet.

      The ancillary contact said they did not get an ACK. The error in the Cloverleaf process log is:

      read returned error 78 (Connection timed out)

      Has anyone seen this?

      – it started at 6:42am (that was the last time we got a message). Strange time for any changes to be made. No one knows about any changes.

      – it only happens for one ancillary (we have a VPN with them)

      – we get any message that is shorter than the max packet size

      – we temporarily fixed it by allowing all port numbers to be used, for this ancillary (IP address) only

      – no other ancillary going through the same firewall was affected

      I did see messages at the debug level concerning other port numbers being used for this thread (server thread, single connection) other than the one port number specified. Are these ephemeral ports? Do these port numbers have to be added to the firewall? why did it work up until that point?

      debug entries:

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Processing SOCKET (PDL server) event 0x321ea088

      [pti :even:DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Calling cb 0x300c7974

      [pdl :read:DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] Events: E 0, R 8, W 0

      [pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Clearing PDL listen event

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tiUnregistering SOCKET (PDL server) event 0x321ea088 for tid 3

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] evUnregister SOCKET (PDL server) event 0x321ea088 for tid 3

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] accepted fd = 69

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tcp-client: 192.168.71.130:65052 connect to server

      [pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Creating PDL listen event on fd 69

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tiRegistering SOCKET (PDL server) event 0x321ea088 for tid 3

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] evRegistering SOCKET (PDL server) event 0x321ea088 for tid 3

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Calling Tcl procedure: hci_pd.default.open-ok

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] with args: {{client “192.168.71.130”} {status ok}}

      What is that port 65052 used for? I assume the tcp-client is the ancillary?

      Here comes the message:

      [pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Processing SOCKET (PDL server) event 0x321ea088

      [pti :even:DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Calling cb 0x300c7974

      [pdl :read:DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] Events: E 0, R 8, W 0

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] read 1380 bytes

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] input buffer accepted 1380 bytes, now 1380

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58]  0b 4d 53 48  7c 5e 7e 5c  |.MSH|^~|

      etc.

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58]  74 2c 20 6c  65 66 74 20  |t, left |

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58]  65 79 65 2e               |eye.|

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] IDLE and 1380 bytes but no error: starting READ

      [pdl :PDL :DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] PDL changed states: old 0, new 1

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Calling Tcl procedure: hci_pd.read

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] with args: {}

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Tcl procedure hci_pd.read returns ‘RECEIVE’

      [pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] trying to match phrase: basic-msg

      [pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] PDL setting timeout in 15.00 seconds

      And then it times out.

      Anyway, why all of a sudden do we need to allow other port numbers besides the port number we set in the tcpip (server) configuration? If the firewall vendor did not make a change and we did not make a change, what other kind of change would either change the use of additional port numbers or somehow force the firewall to start blocking any packets after the first one?

      Thank you

      Peter Heggie
      PeterHeggie@crouse.org

    Viewing 10 reply threads
    • Author
      Replies
      • #85506
        Michael Hertel
        Participant

          Amazing!

          We had the same exact issue yesterday.

          I am trying to find out what the cause and resolution was.

          There was some issue with a Palo Alto firewall change.

          A sister hospital also had the issue. 1380 was the magic number.

          input buffer accepted 1380 bytes, now 1380

          I’ll let you know more when I find out.

          Then again, this could all be a coincidence but I’d rather start a conspiracy theory. lol

          😈

        • #85507
          Peter Heggie
          Participant

            wow! we have Palo Alto also!!

            our network admin said that 1380 was a normal packet size.

            I will let him know.

            Peter Heggie
            PeterHeggie@crouse.org

          • #85508
            Michael Hertel
            Participant

              And the conspiracy grows… Moo Hah hah

              😈

            • #85509
              Peter Heggie
              Participant

                got an update from our network admin. Palo Alto put in an upgrade that caused this, which has since been rolled back. So apparently they not only implement the rules we come up with, they also have a ‘baseline’ set of rules which do get changed from time to time.

                And another team member pointed out that not many of our external vendors have ‘large’ messages that come in over this kind of VPN through our firewall, so it may not have been just this one external source system, but it have been any external source system that sent something larger than 1380 bytes. So you were right – this is part of the key.

                We are going to carefully change our restriction back to where it was and make sure that Palo Alto is working correctly.

                Peter Heggie
                PeterHeggie@crouse.org

              • #85510
                Michael Hertel
                Participant

                  LOL! Conspiracy confirmed.

                  😛

                • #85511
                  Peter Heggie
                  Participant

                    yupper

                    Peter Heggie
                    PeterHeggie@crouse.org

                  • #85512
                    Robert Milfajt
                    Participant

                      OMG, we have this exact same issue.  Everyone was saying it was someone else’s problem, networking saying no problem here.  I’ve sent along my conspiracy theory to networking team.  This was a great catch, must have been some Palo Alto mandatory upgrade that everyone thought we be good to take at the end of August!

                      Thanks for this post, it really helped,

                      Robert Milfajt

                      Robert Milfajt
                      Northwestern Medicine
                      Chicago, IL

                    • #85513
                      Robert Milfajt
                      Participant

                        BTW, it would be nice to know what those parameters that changed were just in case the network folks ask.  Does anyone know which parameters Palo Alto mucked up with this upgrade?

                        Robert Milfajt
                        Northwestern Medicine
                        Chicago, IL

                      • #85514
                        Keith McLeod
                        Participant

                          I had a similar problem with a client where the MTU between source and destination was set to I believe 1380.

                        • #85515
                          Heidi Haafke
                          Participant

                            We had a similar issue the morning of 8/30 with several of our VPNs and also use Palo Alto with these VPNs.  We were troubleshooting with our network team and two of the downstream vendors for several hours with no luck. At 12 noon eastern, the connections mysteriously cleared up on the VPNs having issues.  I am curious as to whether that timing – 12 noon eastern – was the same time other folks saw their connections clear, potentially due to Palo Alto backing out their changes?

                            Senior Systems Integration Analyst
                            IT Imaging and Integration Services
                            UPMC Pinnacle
                            heidi.haafke@pinnaclehealth.org

                          • #85516
                            Robert Milfajt
                            Participant

                              Ours mysteriously cleared up today, didn’t get a timing on it, but clearly a conspiracy.     😕

                              Robert Milfajt
                              Northwestern Medicine
                              Chicago, IL

                          Viewing 10 reply threads
                          • The forum ‘Cloverleaf’ is closed to new topics and replies.