› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › firewall problems and workarounds –
We seem to have a firewall that simply stops letting data thru but otherwise does not notify anyone of the fact (no tcp fin). Any one else seen this ? The more detailed your answers/questions the more help it will be for all who read this thread.
Guess you/we could do the same with a timer proc…
Basically send a do nothing message ..
Perhaps a timer proc with a ping? ..
or .. just a plain cronjob with a ping ..
Dennis
The default keepalive setting for AIX is 2 hours. We have a lot of remote connections and had smilar issues until I lowered it to 15 mins. Now we never have that problem (although remote connections go down for other reasons, obviously, so they aren’t hassle-free). The keepalive setting affects all of the cloverleaf sockets.
To determine current keepalive setting:
$ no -a | grep tcp_keepidle
tcp_keepidle = 14400 (14400=2 hours in 1/2-second intervals)
To change keep alive setting to 30 minutes (value is in half-second intervals):
– Login as Root and type the following command:
no -o tcp_keepidle=3600
(1800 for 15 minutes)
Also, add the command to the bottom of /etc/rc.net to auto-reset after reboot.
cat /proc/sys/net/ipv4/tcp_keepalive_time
to change to 15 minues
echo 900 > /proc/sys/net/ipv4/tcp_keepalive_time
This is not perm ..
I am in the porcess of setting this up between an aix and windows system with a brain dead firewall (60 min global timeout,no notification) and will post the results.
and this from another website
I-322 If your aborted sessions aren’t properly cleaned up or if your idle but live sessions are dropped inadvertently, you may need to adjust these two registry parameters.
Hive: HKEY_LOCAL_MACHINE
Key: SystemCurrentControlSetServicesTcpipParameters
Value Name: KeepAliveTime
Data Type: REG_DWORD
Value: 7,200,000
I-323 Hive: HKEY_LOCAL_MACHINE
Key: SystemCurrentControlSetServicesTcpipParameters
Value Name: KeepAliveInterval
Data Type: REG_DWORD
Value: 1000
Both values are in milliseconds. The default value for KeepAliveTime is 7,200,000, or 2 hours, and the default for KeepAliveInterval is 1000, or 1 second. KeepAliveTime governs how often Windows NT sends a keep alive packet. A specific application can request that keep-alive packets be sent. If the target system is able, it responds with an acknowledgment. The KeepAliveInterval works with the KeepAliveTime and governs how often keep-alive packets are sent until an acknowledgment is received. If the target machine doesn’t respond and the number of retries exceeds the value of TCPMaxDataRetransmissions, the connection is terminated. Restart your machine for any changes to take effect.
Looking at this, the implication is
KeepAliveTime governs how often Windows NT sends a keep al
A specific application can request that keep-alive packets be sent. If the target system is able, it responds with an acknowledgment. The KeepAliveInterval works with the KeepAliveTime and governs how often keep-alive packets are sent until an acknowledgment is received.
implication – if an app opens a socket with keepalive, a keep alive will be sent every second, if no keepalive ack after 2 hours, …. While not specifically stated I am assuming the connection may closed ? I thing the other parameters play a bigger part in closing the connection on keepalive timeouts
This is what I believe it happens (in English) Please let me know if I have the wrong picture:
If I
Keep in mind that the problem (firewall) affects both inbound and outbound connections.
depending on how cl and vendor connections are set up, the hung connections will be eventually errored out but that time might be execcessive due to the tcp notcpack algorithims.
cl inbound
some advocate setting up cl as multiserver, and that would definetly allow inbound connections when the sender eventually does something to try to establish a new connection, but how long is it going to take the sender to know there is a problem and bounce their side ? There might be a lot
of important messages that need to flow at that time that will be delayed. What might be other ramifications, security concerns, etc. ?
cl outbound
we still have the problem of when will tcp do its notcpack and cause a socket error so the thread will attempt a reconnect if we don’t do message timeouts, and if we do do message timeouts and resends, if we don’t do a thread down/up in a reasonable amount of time to restablish a new connection, we are still dependent on the tcpnoack. If we do a thread down/up, it doesn’t establish a new connecttion unless the reciever is in a mutliserver mode. There is also an additional problem in that of you shell out from a thread a bg processes that will stop/start the thread, it will ad
d the that threads process environment eventually causing that process to panic when it runs out of env space.
In these scenarios keep in mind that may be multiple firewalls invloved, 2 or more and all by different vendors – bear in mind both inbound and outbound connections, and possible vendor requirements.
Ideally, one should be able to configure a firewall(s) for no timeout on selected connections.
They say they cant do that.
2nd choice would be tcp keepalives – cloverelaf does not support opening a socket in that manner. And it is not known if all vendor products would open their sockets with keep alives. So maybe the system level tcp keepalives could be changed to 30 minutes instead of 2 hours. this would keep the connections connected. the vendors would have to either support the socket open with keepalive or be willing to change the system level keepalive to 30 minutes
All this is assuming that the firewalls will pass the keepalive packets. I say this because the timeout on the firewall to to stop data and tcp keeplaive would not allow the firwall to do so, defeating that firewall, so whats the purpose of this firewall option?
3rd. aplication level keepalive messages – works well, meets the requirements of the firewall. Cloverleaf is easily set up to handle inbound, scheduled resends can do the outbound. requires vendor coding to support. Unkown what it would require of each vendor.
4th multiserver – see discussion at top – has its own set of problems
comments are solicited on opinions of each method, adtvantages/drawbacks of each for both cloverl
eaf and any known vendors.
sent a message and received and ack – waited 68 minutes with no messages being sent, sent another message and received the ack.
None of the firewalls timed out the connection.
I am currently testing the same connection without the SO_KEEPALIVE and
expect the system keepalive that will be sent in 2 hours to fail to go thru the firewall and start the notcpack sequences to start and error the socket to determine how long that takes.
For anyone who would like to try the same and report on your findings,
make a copy of hcictptest to tcptestkeepalive, and add the line (a single line)
setsockopt(NS,&SOL_SOCKET,&SO_KEEPALIVE,undef) || warn “setsockopt: $!”;
where shown below.
start it connecting to your remote as cloverleaf would do
for example
tcptestkeepalive -h 192.168.4.4 -p 8075 -t mlp
and send a message like MSH||||||
leave your test program running.
It should be acked OK or rejected unless the system you connected is brain dead ( which some are)
Wait at least one hour and then some and send the message again
If it is working you will receive the same ack message other wise nothing.
#######################################
# init_client – initialize and connect
# to host as client
sub init_client {
$them = $opt_h;
$iaddr = inet_aton($remote);
$paddr = sockaddr_in($port, $iaddr);
$proto = getprotobyname(‘tcp’);
socket(SOCKET, AF_INET, SOCK_STREAM, $proto) || die “socket error: $!”;
# use keepalive
setsockopt(NS,&SOL_SOCKET,&SO_KEEPALIVE,undef) || warn “setsockopt: $!”;
print STDOUT “connecting…nn”;
if (connect(SOCKET,$paddr)) {
print STDOUT “Connected to host: $them, port: $portnn”;
} else {
die “socket error: $!”;
}
}
This is what I believe it happens (in English) Please let me know if I have the wrong picture:
If I
thanks
To inspect the keepalive:
To set the keepalive:
We noticed that our outbound connections (over VPN) have trouble re-connecting when “Wait for ACK Timeout” is set to “-1”. But, those same connections just fix themselves when configured with a reasonable timeout value… No keepalive changes necessary.
Of course, rather than just reconnect all the time, also treat the problem by applying the suggested keepalive fixes.