read returned error 78 – firewall sends only first packet

This topic has 11 replies, 5 voices, and was last updated 7 years, 9 months ago by Robert Milfajt.

Creator

Topic
August 31, 2017 at 1:19 pm #55483
Peter Heggie
Participant
Our network administrator can see an external ancillary system sending us two packets, but only the first packet is delivered.

In the Cloverleaf log with debugging turned up, I can see the first message does not have the stop characters that the PDL is looking for. The protocol times out waiting for the last packet.

The ancillary contact said they did not get an ACK. The error in the Cloverleaf process log is:

read returned error 78 (Connection timed out)

Has anyone seen this?

– it started at 6:42am (that was the last time we got a message). Strange time for any changes to be made. No one knows about any changes.

– it only happens for one ancillary (we have a VPN with them)

– we get any message that is shorter than the max packet size

– we temporarily fixed it by allowing all port numbers to be used, for this ancillary (IP address) only

– no other ancillary going through the same firewall was affected

I did see messages at the debug level concerning other port numbers being used for this thread (server thread, single connection) other than the one port number specified. Are these ephemeral ports? Do these port numbers have to be added to the firewall? why did it work up until that point?

debug entries:

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Processing SOCKET (PDL server) event 0x321ea088

[pti :even:DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Calling cb 0x300c7974

[pdl :read:DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] Events: E 0, R 8, W 0

[pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Clearing PDL listen event

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tiUnregistering SOCKET (PDL server) event 0x321ea088 for tid 3

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] evUnregister SOCKET (PDL server) event 0x321ea088 for tid 3

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] accepted fd = 69

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tcp-client: 192.168.71.130:65052 connect to server

[pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Creating PDL listen event on fd 69

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] tiRegistering SOCKET (PDL server) event 0x321ea088 for tid 3

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] evRegistering SOCKET (PDL server) event 0x321ea088 for tid 3

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Calling Tcl procedure: hci_pd.default.open-ok

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] with args: {{client “192.168.71.130”} {status ok}}

What is that port 65052 used for? I assume the tcp-client is the ancillary?

Here comes the message:

[pti :even:DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Processing SOCKET (PDL server) event 0x321ea088

[pti :even:DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] Calling cb 0x300c7974

[pdl :read:DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] Events: E 0, R 8, W 0

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] read 1380 bytes

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] input buffer accepted 1380 bytes, now 1380

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] 0b 4d 53 48 7c 5e 7e 5c |.MSH|^~|

etc.

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] 74 2c 20 6c 65 66 74 20 |t, left |

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] 65 79 65 2e |eye.|

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] IDLE and 1380 bytes but no error: starting READ

[pdl :PDL :DBUG/2: fr_gems_rslt:08/30/2017 10:25:58] PDL changed states: old 0, new 1

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Calling Tcl procedure: hci_pd.read

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] with args: {}

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] Tcl procedure hci_pd.read returns ‘RECEIVE’

[pdl :PDL :DBUG/0: fr_gems_rslt:08/30/2017 10:25:58] trying to match phrase: basic-msg

[pdl :PDL :DBUG/1: fr_gems_rslt:08/30/2017 10:25:58] PDL setting timeout in 15.00 seconds

And then it times out.

Anyway, why all of a sudden do we need to allow other port numbers besides the port number we set in the tcpip (server) configuration? If the firewall vendor did not make a change and we did not make a change, what other kind of change would either change the use of additional port numbers or somehow force the firewall to start blocking any packets after the first one?

Thank you

Peter Heggie
PeterHeggie@crouse.org
Creator

Topic

Viewing 10 reply threads

Author

Replies
- August 31, 2017 at 5:34 pm #85506
  Michael Hertel
  Participant
  Amazing!
  
  We had the same exact issue yesterday.
  
  I am trying to find out what the cause and resolution was.
  
  There was some issue with a Palo Alto firewall change.
  
  A sister hospital also had the issue. 1380 was the magic number.
  
  input buffer accepted 1380 bytes, now 1380
  
  I’ll let you know more when I find out.
  
  Then again, this could all be a coincidence but I’d rather start a conspiracy theory. lol
  
  😈
- August 31, 2017 at 5:41 pm #85507
  Peter Heggie
  Participant
  wow! we have Palo Alto also!!
  
  our network admin said that 1380 was a normal packet size.
  
  I will let him know.
  
  Peter Heggie
  PeterHeggie@crouse.org
- August 31, 2017 at 6:00 pm #85508
  Michael Hertel
  Participant
  And the conspiracy grows… Moo Hah hah
  
  😈
- August 31, 2017 at 7:39 pm #85509
  Peter Heggie
  Participant
  got an update from our network admin. Palo Alto put in an upgrade that caused this, which has since been rolled back. So apparently they not only implement the rules we come up with, they also have a ‘baseline’ set of rules which do get changed from time to time.
  
  And another team member pointed out that not many of our external vendors have ‘large’ messages that come in over this kind of VPN through our firewall, so it may not have been just this one external source system, but it have been any external source system that sent something larger than 1380 bytes. So you were right – this is part of the key.
  
  We are going to carefully change our restriction back to where it was and make sure that Palo Alto is working correctly.
  
  Peter Heggie
  PeterHeggie@crouse.org
- August 31, 2017 at 7:44 pm #85510
  Michael Hertel
  Participant
  LOL! Conspiracy confirmed.
  
  😛
- August 31, 2017 at 7:46 pm #85511
  Peter Heggie
  Participant
  yupper
  
  Peter Heggie
  PeterHeggie@crouse.org
- September 5, 2017 at 4:42 pm #85512
  Robert Milfajt
  Participant
  OMG, we have this exact same issue. Everyone was saying it was someone else’s problem, networking saying no problem here. I’ve sent along my conspiracy theory to networking team. This was a great catch, must have been some Palo Alto mandatory upgrade that everyone thought we be good to take at the end of August!
  
  Thanks for this post, it really helped,
  
  Robert Milfajt
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- September 5, 2017 at 4:49 pm #85513
  Robert Milfajt
  Participant
  BTW, it would be nice to know what those parameters that changed were just in case the network folks ask. Does anyone know which parameters Palo Alto mucked up with this upgrade?
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- September 5, 2017 at 5:14 pm #85514
  Keith McLeod
  Participant
  I had a similar problem with a client where the MTU between source and destination was set to I believe 1380.
- September 6, 2017 at 11:39 am #85515
  Heidi Haafke
  Participant
  We had a similar issue the morning of 8/30 with several of our VPNs and also use Palo Alto with these VPNs. We were troubleshooting with our network team and two of the downstream vendors for several hours with no luck. At 12 noon eastern, the connections mysteriously cleared up on the VPNs having issues. I am curious as to whether that timing – 12 noon eastern – was the same time other folks saw their connections clear, potentially due to Palo Alto backing out their changes?
  
  Senior Systems Integration Analyst
  IT Imaging and Integration Services
  UPMC Pinnacle
  heidi.haafke@pinnaclehealth.org
- September 6, 2017 at 7:34 pm #85516
  Robert Milfajt
  Participant
  Ours mysteriously cleared up today, didn’t get a timing on it, but clearly a conspiracy. 😕
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
Author

Replies

Viewing 10 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.