fin_wait

  • Creator
    Topic
  • #47728
    Kevin Scantlan
    Participant

    We are having problems getting re-connected to an interface when we take the server side (on the engine) down and then bring it back back up.  It says “OPENING” and continues to say that.  I did a netstat -a and greped for the port number and the status of the port is fin_wait_2.  I checked with our network administrators and they believe that the engine is sending a fin to the client to say we went down, but that the client is not sending back a fin_ack.  So, the port is tied up waiting for that acknowledgement.  

    Is there anyway that we can tell the engine to timeout waiting for that ack and release the port.

    We don’t have any problems when the client side goes down first.  The system we are working with is CoPathPlus (owned by Cerner).  They say they don’t have any problems with other Cloverleaf sites ( of course ).

Viewing 3 reply threads
  • Author
    Replies
    • #56552
      Charlie Bursell
      Participant

      The engine can’t time out, the port is still open.  We don’t build or install TCP/IP on your box, we just use it.

      FIN_WAIT_2 is the state a TCP connection enters after it has sent a FIN, and received an ACK of that FIN – it has done its half of the graceful connection shutdown. Now the connection is waiting for the remote to send a FIN to shutdown his half of the connection.

      In the old days the only way out of a FIN_WAIT_2 on AIX was to reboot the box or cycle the vendor side.  Now ahat the engine does if restarted is to isse SO_REUSEADDR which effectively shoots the port.  I don’t think this works on HPUX, at laest it disn’t use to.

      I think you should have the vendor re-read RFC 793.

      Charlie

    • #56553
      Kevin Scantlan
      Participant

      Charlie,

      I agree that the vendor is not doing what it’s supposed to do.  In fact, on a conference call with the vendor I told them as much.  But they don’t and for them to correct their coding will take an act of Congress.  So, what’s an interface engine to do?  If I read your post correctly, when I bring the thread back up, this should remedy the situation.  So here’s what I did (along with netstat -a output):

      1.  Original state (interface is up)

      [test2]/hci/qdx5.2/integrator/test2>netstat -a | grep 22232

      tcp4       0      0  uhcsp119.22232         umhc-copathIN01..3218  ESTABLISHED

      2.  Took our side (server) down

      [test2]/hci/qdx5.2/integrator/test2>netstat -a | grep 22232

      tcp4       0      0  uhcsp119.22232         umhc-copathIN01..3218  FIN_WAIT_2

      3.  Waited for a few minutes, then brought our side back up

      [test2]/hci/qdx5.2/integrator/test2>netstat -a | grep 22232

      tcp4       0      0  *.22232                *.*                    LISTEN

      tcp4       0      0  uhcsp119.22232         umhc-copathIN01..3218  FIN_WAIT_2

      4.  Waited at least 10 minutes.

      [test2]/hci/qdx5.2/integrator/test2>netstat -a | grep 22232

      tcp4       0      0  *.22232                *.*                    LISTEN

      It looks like the port was finally released.    However, we never regained the connection and stayed in an OPENING state.    During this whole time, there was no manual intervention on the client side.  But it appears that the client side does nothing to regain the connection.  In fact, and the vendor said as much, the client side does not even know that the connection was lost (still showing as up on their side), until it gets an error when trying to send a message.  They claim when we are back up, the connection will be re-established when they send a message.

      It looks like everytime we take the server side on the engine down that we must bounce the client side after bringing the server side back up in order to go from an OPENING state to an UP state.  Any suggestions?

    • #56554
      Charlie Bursell
      Participant

      There are utilities available (see Google) that will blow away a FIN_WAIT_2.  However, use with caustion as they have to diddle with the oS – you must have root privledges.

      Notice that the engine did blow away its connection when you restarted.  However, the client side still has the port and will do nothing until it attempts to send and notes the port is down.

      If you cannot get them to do anything, my only suggestion would be an alert that triggers if nothing sent in a while Then run a script to see if in FIN_WAIT_2.  If so, cycle the thread.  Then the next time the vendor tries to send he will re-establish.

      Of course, if you could somehow cycle his side, that would be better.

      I would have my network people put a sniffer on that system, and record exactly what is going on.  Once you have the proof that they are not doing wht they are required to do IAW the RFC, they would have to fix it.

      Of course, if for some reason it were on our side, we would have to do the same thing.  However, I would be hard pressed to believe that since we have dealt with this for many years.

      Other than the above, all I can do is wish you luck.  Its a bear and we’ve all been there  🙁

    • #56555
      James Booty
      Participant

      We have experienced the same type of problem with one of our outbound threads to Labcorp. Our solution is to call their support site and get them to bump their side of the interface. This happens about 50% of the time after we do our weekly reboot during the scheduled downtime maintenance.

Viewing 3 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,295
Replies
34,439
Topic Tags
287
Empty Topic Tags
10