firewall problems and workarounds –

This topic has 17 replies, 8 voices, and was last updated 19 years, 1 month ago by Nathan Martin.

Creator

Topic
October 25, 2005 at 8:05 pm #48105
Anonymous
Participant
As more and more remote systems are being used, firewalls are being put in place that timeout after one hour and prevent data from flowing. I am interested in your solutions and why you chose them, details of what you could or could not do and why. How you handle outbound ports as well as inbound ports and how your remote vendors handle the same thing. how does cloverleaf/remote know that the firewall has timed out ?

We seem to have a firewall that simply stops letting data thru but otherwise does not notify anyone of the fact (no tcp fin). Any one else seen this ? The more detailed your answers/questions the more help it will be for all who read this thread.
Creator

Topic

Viewing 16 reply threads

Author

Replies
- October 25, 2005 at 9:47 pm #57649
  Dennis Pfeifer
  Participant
  in working with HDX, their solution was to send a ‘HEARTBEAT’ every 10 minutes ..
  
  Guess you/we could do the same with a timer proc…
  
  Basically send a do nothing message ..
  
  Perhaps a timer proc with a ping? ..
  
  or .. just a plain cronjob with a ping ..
  
  Dennis
- October 25, 2005 at 10:29 pm #57650
  Ryan Boone
  Participant
  On AIX —
  
  The default keepalive setting for AIX is 2 hours. We have a lot of remote connections and had smilar issues until I lowered it to 15 mins. Now we never have that problem (although remote connections go down for other reasons, obviously, so they aren’t hassle-free). The keepalive setting affects all of the cloverleaf sockets.
  
  To determine current keepalive setting:
  
  $ no -a | grep tcp_keepidle
  
  tcp_keepidle = 14400 (14400=2 hours in 1/2-second intervals)
  
  To change keep alive setting to 30 minutes (value is in half-second intervals):
  
  – Login as Root and type the following command:
  
  no -o tcp_keepidle=3600
  
  (1800 for 15 minutes)
  
  Also, add the command to the bottom of /etc/rc.net to auto-reset after reboot.
- October 26, 2005 at 12:39 am #57651
  Dennis Pfeifer
  Participant
  on Linux .. value is in seconds
  
  cat /proc/sys/net/ipv4/tcp_keepalive_time
  
  to change to 15 minues
  
  echo 900 > /proc/sys/net/ipv4/tcp_keepalive_time
  
  This is not perm ..
- October 26, 2005 at 1:02 pm #57652
  Bill Bertera
  Participant
  Anyone know where to find keepalive setting on Solaris? thanks
- October 27, 2005 at 3:42 pm #57653
  Anonymous
  Participant
  looks like a possible solution on the aix side – what about our myriad of vendors however ? They open sockets to us and it would be nice if they also sent keepalives every 30 minutes. Anyone know of nay problem with the fiewall and the tcp keepalives ?
- October 27, 2005 at 7:14 pm #57654
  Anonymous
  Participant
  I found this associated with windows based systems. It looks like a standard implementation of tcp includes a default keep alive at 2 hours with a socket open with keepalive modifying it for that socket only. And of course other parms that will cause the socket to close after several failures of the keepalive ack.
  
  I am in the porcess of setting this up between an aix and windows system with a brain dead firewall (60 min global timeout,no notification) and will post the results.
  
  http://www.winguides.com/registry/display.php/891/
  
  and this from another website
  
  I-322 If your aborted sessions aren’t properly cleaned up or if your idle but live sessions are dropped inadvertently, you may need to adjust these two registry parameters.
  
  Hive: HKEY_LOCAL_MACHINE
  
  Key: SystemCurrentControlSetServicesTcpipParameters
  
  Value Name: KeepAliveTime
  
  Data Type: REG_DWORD
  
  Value: 7,200,000
  
  I-323 Hive: HKEY_LOCAL_MACHINE
  
  Key: SystemCurrentControlSetServicesTcpipParameters
  
  Value Name: KeepAliveInterval
  
  Data Type: REG_DWORD
  
  Value: 1000
  
  Both values are in milliseconds. The default value for KeepAliveTime is 7,200,000, or 2 hours, and the default for KeepAliveInterval is 1000, or 1 second. KeepAliveTime governs how often Windows NT sends a keep alive packet. A specific application can request that keep-alive packets be sent. If the target system is able, it responds with an acknowledgment. The KeepAliveInterval works with the KeepAliveTime and governs how often keep-alive packets are sent until an acknowledgment is received. If the target machine doesn’t respond and the number of retries exceeds the value of TCPMaxDataRetransmissions, the connection is terminated. Restart your machine for any changes to take effect.
  
  Looking at this, the implication is
  
  KeepAliveTime governs how often Windows NT sends a keep alive packet – every 2 hours the system send a keepalaive packet
  
  A specific application can request that keep-alive packets be sent. If the target system is able, it responds with an acknowledgment. The KeepAliveInterval works with the KeepAliveTime and governs how often keep-alive packets are sent until an acknowledgment is received.
  
  implication – if an app opens a socket with keepalive, a keep alive will be sent every second, if no keepalive ack after 2 hours, …. While not specifically stated I am assuming the connection may closed ? I thing the other parameters play a bigger part in closing the connection on keepalive timeouts
- October 28, 2005 at 1:31 pm #57655
  Anonymous
  Participant
  What about using multiserver?
  
  This is what I believe it happens (in English) Please let me know if I have the wrong picture:
  
  If I
- October 28, 2005 at 1:36 pm #57656
  Bill Bertera
  Participant
  we’ve got plans to try the multiserver approach. I’ll let you know how it goes.
- October 28, 2005 at 4:28 pm #57657
  Anonymous
  Participant
  Some advocate a multiserver on cloverleaf to work around the problem.
  
  Keep in mind that the problem (firewall) affects both inbound and outbound connections.
  
  depending on how cl and vendor connections are set up, the hung connections will be eventually errored out but that time might be execcessive due to the tcp notcpack algorithims.
  
  cl inbound
  
  some advocate setting up cl as multiserver, and that would definetly allow inbound connections when the sender eventually does something to try to establish a new connection, but how long is it going to take the sender to know there is a problem and bounce their side ? There might be a lot
  
  of important messages that need to flow at that time that will be delayed. What might be other ramifications, security concerns, etc. ?
  
  cl outbound
  
  we still have the problem of when will tcp do its notcpack and cause a socket error so the thread will attempt a reconnect if we don’t do message timeouts, and if we do do message timeouts and resends, if we don’t do a thread down/up in a reasonable amount of time to restablish a new connection, we are still dependent on the tcpnoack. If we do a thread down/up, it doesn’t establish a new connecttion unless the reciever is in a mutliserver mode. There is also an additional problem in that of you shell out from a thread a bg processes that will stop/start the thread, it will ad
  
  d the that threads process environment eventually causing that process to panic when it runs out of env space.
  
  In these scenarios keep in mind that may be multiple firewalls invloved, 2 or more and all by different vendors – bear in mind both inbound and outbound connections, and possible vendor requirements.
  
  Ideally, one should be able to configure a firewall(s) for no timeout on selected connections.
  
  They say they cant do that.
  
  2nd choice would be tcp keepalives – cloverelaf does not support opening a socket in that manner. And it is not known if all vendor products would open their sockets with keep alives. So maybe the system level tcp keepalives could be changed to 30 minutes instead of 2 hours. this would keep the connections connected. the vendors would have to either support the socket open with keepalive or be willing to change the system level keepalive to 30 minutes
  
  All this is assuming that the firewalls will pass the keepalive packets. I say this because the timeout on the firewall to to stop data and tcp keeplaive would not allow the firwall to do so, defeating that firewall, so whats the purpose of this firewall option?
  
  3rd. aplication level keepalive messages – works well, meets the requirements of the firewall. Cloverleaf is easily set up to handle inbound, scheduled resends can do the outbound. requires vendor coding to support. Unkown what it would require of each vendor.
  
  4th multiserver – see discussion at top – has its own set of problems
  
  comments are solicited on opinions of each method, adtvantages/drawbacks of each for both cloverl
  
  eaf and any known vendors.
- October 28, 2005 at 6:18 pm #57658
  Anonymous
  Participant
  To test socket open with keepalive, I modified hcitcptest to open a socket to the remote system (going thru at least 2 firewalls) with SO_KEEPALIVE,
  
  sent a message and received and ack – waited 68 minutes with no messages being sent, sent another message and received the ack.
  
  None of the firewalls timed out the connection.
  
  I am currently testing the same connection without the SO_KEEPALIVE and
  
  expect the system keepalive that will be sent in 2 hours to fail to go thru the firewall and start the notcpack sequences to start and error the socket to determine how long that takes.
  
  For anyone who would like to try the same and report on your findings,
  
  make a copy of hcictptest to tcptestkeepalive, and add the line (a single line)
  
  setsockopt(NS,&SOL_SOCKET,&SO_KEEPALIVE,undef) || warn “setsockopt: $!”;
  
  where shown below.
  
  start it connecting to your remote as cloverleaf would do
  
  for example
  
  tcptestkeepalive -h 192.168.4.4 -p 8075 -t mlp
  
  and send a message like MSH||||||
  
  leave your test program running.
  
  It should be acked OK or rejected unless the system you connected is brain dead ( which some are)
  
  Wait at least one hour and then some and send the message again
  
  If it is working you will receive the same ack message other wise nothing.
  
  #######################################
  
  # init_client – initialize and connect
  
  # to host as client
  
  sub init_client {
  
  $them = $opt_h;
  
  $iaddr = inet_aton($remote);
  
  $paddr = sockaddr_in($port, $iaddr);
  
  $proto = getprotobyname(‘tcp’);
  
  socket(SOCKET, AF_INET, SOCK_STREAM, $proto) || die “socket error: $!”;
  
  # use keepalive
  
  setsockopt(NS,&SOL_SOCKET,&SO_KEEPALIVE,undef) || warn “setsockopt: $!”;
  
  print STDOUT “connecting…nn”;
  
  if (connect(SOCKET,$paddr)) {
  
  print STDOUT “Connected to host: $them, port: $portnn”;
  
  } else {
  
  die “socket error: $!”;
  
  }
  
  }
- October 28, 2005 at 6:42 pm #57659
  Anonymous
  Participant
  Carlos said the below – * are my inserted comments
  
  This is what I believe it happens (in English) Please let me know if I have the wrong picture:
  
  If I
- January 11, 2006 at 8:10 pm #57660
  Kevin Scantlan
  Participant
  Does the keep_alive setting need to be set for only the client side or the server side or does it not matter?
- January 11, 2006 at 9:34 pm #57661
  Ryan Boone
  Participant
  Once I lowered the keepalive on the engine box, it maintained all of the socket connections. This is very important for us because we connect to a lot of systems outside of our network (client offices, clinics, hospitals, etc). Beforehand, any systems that had a lower keepalive setting would maintain the connection, but those that did not would time out.
- January 12, 2006 at 6:01 pm #57662
  Daniel Lee
  Participant
  Whenever we connect with a system outside of our network we set up a VPN tunnel for the connection. Our network guy has some way that he can set up a keep alive on the firewall to keep this tunnel from timing out. Since he set this up we haven’t had a problem with the tunnel timing out.
- May 30, 2006 at 6:00 pm #57663
  Bill Bertera
  Participant
  Has anyone tried “Close after Write” for the firewall problem, when CLV is the client? Does it actually close after each message, or does it wait a certain amount of time in case others are pending, so it doens’t bounce up & down for every message.
  
  thanks
- May 31, 2006 at 11:06 am #57664
  David Harrison
  Participant
  On Solaris, use ndd to inspect or set tcp settings.
  
  To inspect the keepalive:
- ndd -get /dev/tcp tcp_keepalive_interval
- ndd -set /dev/tcp tcp_keepalive_interval
  nnnnnnn

May 31, 2006 at 8:51 pm #57665

Nathan Martin

Participant

I’ll add my 2 cents.

We noticed that our outbound connections (over VPN) have trouble re-connecting when “Wait for ACK Timeout” is set to “-1”. But, those same connections just fix themselves when configured with a reasonable timeout value… No keepalive changes necessary.

Of course, rather than just reconnect all the time, also treat the problem by applying the suggested keepalive fixes.

Viewing 16 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.