Proto Errors in 5.8

This topic has 10 replies, 4 voices, and was last updated 13 years, 8 months ago by Steve Pringle.

Creator

Topic
October 15, 2010 at 10:14 pm #52054
Dave Thall
Participant
Is anyone having any issues with proto errors in 5.8 more so than in 5.6?

Example:

Proto Err:

Error Msg: FTP operation failed: bind failed with errno 67: Address already in use

We have a few threads that intermittently go into a opening status on a proto error. This is seen more so in 5.8, then ever in 5.6. It appears it is only happening in one or two sites.

I think this may be a local binding address issue. We have our server address configured into the local binding address since 5.6. Where it was working with very little or no issues. Most of the threads we are seeing this on are vpn and configured as ftp and tcp-ip connections. Adding an extra alert to stop and restart the thread has helped in some cases. We are running 5.8 on AIX with HA, hardware and OS is not a issue here.

I am posting this to see if anyone has seen the same problem in 5.8 over 5.6.

This could be a networking and timing issue. Where the vendor is closing or freeing up the port. But most of these interfaces have been up for a while without any issues. Not sure if it
Creator

Topic

Viewing 9 reply threads

Author

Replies
- October 10, 2011 at 11:51 pm #72877
  Steve Pringle
  Participant
  Dave,
  
  We have a test 5.8 site we just got running and we’re seeing the “FTP operation failed: bind failed with errno 67: Address already in use” error when we use the fileset-ftp protocol.
  
  Did you ever determine what was causing this error?
  
  thanks,
  
  Steve
- October 11, 2011 at 1:42 pm #72878
  Charlie Bursell
  Participant
  Are you sure you have the host address in the correct place?
  
  Under FTP Options there is a field labled:
  
  Host Name or IP Address
  
  That field allows you to bind a specific network in the case of multiple networks like when using HA. It is normally left blank
  
  Under FTP Host Info you see
  
  HOST:
  
  That is where the address of your FTP server goes
  
  Just a thought
- October 11, 2011 at 3:24 pm #72879
  Steve Pringle
  Participant
  THe host address is correct. The thread gets the data, which results in the bind errno 67 msg, and the msg is in a pending state. If I stop and start the thread the file is ftp’ed successfully.
  
  This configuration works fine under 5.3 (we’re upgrading to 5.8).
  
  thanks,
  
  Steve
- October 11, 2011 at 3:32 pm #72880
  Charlie Bursell
  Participant
  Strange error message Steve. It does not look like a cURL error
  
  The primary difference from 5.3 is that the FTP protocol now uses cURL.
  
  Have you contacted Support?
- October 11, 2011 at 3:37 pm #72881
  Steve Pringle
  Participant
  I have not contacted support, but will do so.
  
  thanks,
  
  Steve
- October 12, 2011 at 6:02 pm #72882
  Russ Ross
  Participant
  Steve:
  
  Did you upgrade your OS when going from Cloverleaf 5.6 to AIX 5.8?
  
  If yes, did you upgrade the OS in place or lay down the OS from scratch on another server?
  
  The reason I asked is because I’ve been keeping track of concerns about going from Cloverleaf 5.3 to Cloverleaf 5.8 so I will not get hurt with known concerns.
  
  One of those concerns is that if upgrading AIX 5.3 in place to say AIX 6.1 there is a know concern about the upgrade getting confused and changing system settings to undesired settings and causing problems.
  
  I’ve put our system admins on notice to watch out for it but they already knew about it, but apparently it has burned those less aware.
  
  Here is a URL to look at if interested:
  
  ~~<a href="~~https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/aix_6_1_migration_iostat_and_maxuproc_change_to_their_defaults?lang=en_us” class=”bbcode_url”>https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/aix_6_1_migration_iostat_and_maxuproc_change_to_their_defaults?lang=en_us
  
  Russ Ross
  RussRoss318@gmail.com
- October 12, 2011 at 11:15 pm #72883
  Steve Pringle
  Participant
  Found the problem.
  
  A little background: We’re upgrading from Cloverleaf 5.3 to 5.8, moving to a new AIX 6.1 HA cluster (Russ – we did not upgrade AIX, we started with 6.1, but thanks for your post). Because it’s a cluster we were advised to set LOCAL_IP in NetConfig (aka Local Binding Address in the gui) to the cluster name for all outbound threads. The LOCAL_IP was set to our cluster name on the ftp thread. When I set LOCAL_IP to null I was able to ftp files successfully.
  
  Charlie – I found this in the process log:
  
  [fset:wrte:ERR /0: dDaysurgrb:10/12/2011 11:46:05] Error while trying to write 09302011_QQQQQQ_HELEN_08161954_EH1071119941.rtf.
  
  Detailed error:bind failed with errno 67: Address already in use
  
  Curl errCode:7 Curl error: Couldn’t connect to server
  
  thanks to all for your help.
  
  –Steve
- October 13, 2011 at 12:38 pm #72884
  Russ Ross
  Participant
  Steve:
  
  I’m always happy to learn of another Cloverleaf shop that is similar to our environment and leverage each others experiences.
  
  We have a HA cluster running on AIX5.3 right now and will also be going to AIX 6.1 when we can, but we do use the service address for interfaces and FTP
  
  This took a bit of work to figure out how to accomplish because the service address does not fail-over transparently like I wish it did as you obviously know by now.
  
  Also, we are running ether channel which is really NIC load balancing with one or more NICs behaving as one the way we implemented it.
  
  So when we fail over I currently use the NIC alias command to have the service address appear before the persistnent address causing the service address to behave like the persistent address does by default.
  
  Here is some of my code in my HA start up script to help give you a tangible example if you are interested:
  
  Code: #—————————————————————————————- # Set the hostname to the service hostname so that the failover is transparent # Note: no longer necessary to set hostname to service name so commented out on 11/7/2009 # but did not delete in case future HACMP ever requires this be done again #—————————————————————————————- ### hostname $HACMP_SERVICE_HOSTNAME #————————————————————————– # get netstat snapshot before doing any ehter channel alias reconfiguration #————————————————————————– rm -f $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “—————————————————————-” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “netstat -rn before doing any ether channel alias reconfiguration (`date`)” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “—————————————————————-” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt netstat -rn >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “—————————————————————-” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “netstat -in before doing any ether channel alias reconfiguration (`date`)” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “—————————————————————-” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt netstat -in >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt #——————————————————————————————- # remove the persitent address ether channel alias from en2 to assure all outbound traffic # will use the service address ether channel alias on en2 as the default address # # Note: # – this is so that all inbound and outbound trafic will failover transparently # – the hostname is the name of the persistent address # because it is no longer necessary to set the hostname to the name of the service address #——————————————————————————————- ifconfig en2 $HACMP_ORIG_HOSTNAME -alias #——————————————————————————————————— # add back the ehter channel alias for the persistent address on en2 # now that the ether channel alias for the service address is the default address for all outbound traffic #——————————————————————————————————— ifconfig en2 $HACMP_ORIG_HOSTNAME netmask 255.255.255.128 alias #————————————————————————– # get netstat snapshot after doing any ehter channel alias reconfiguration #————————————————————————– echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “================================================================” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “netstat -rn after doing any ether channel alias reconfiguration (`date`)” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “================================================================” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt netstat -rn >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “================================================================” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “netstat -in after doing any ether channel alias reconfiguration (`date`)” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “================================================================” >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt netstat -in >> $SCRIPTDIR/os.start__netstat-audit.txt echo “” >> $SCRIPTDIR/os.start__netstat-audit.txt
  
  We are on AIX 5.3 now with ether channel but in older versions of AIX and before we used ehter channel I detached the NICs and added them back in the desired order to accomplish this.
  
  Aslo a nice featyre with the ether channel aliasing, my hostname command now shows the physical machine name and in the old days before ether channel aliasing I would change the hostname to the service name on HA startup which confused the admins and once in a while they powered off the wrong physical box, OUCH!
  
  I did become aware of a NetConfig setting that allows Cloverleaf in TCP/IP real-time interface to say which IP the interface pretends it is on if it needs to be different from the default IP for outbound traffic.
  
  However, that would require changing just about every interface everytime we upgrade because our upgrades are done from a scratch install on a new LPAR like you might be doing.
  
  I really wanted a fail-over that was as transparent as I could figure out how to make it.
  
  Russ Ross
  RussRoss318@gmail.com
- October 13, 2011 at 12:50 pm #72885
  Russ Ross
  Participant
  Here is an previous post of mine I noticed that has some additonal detail along the same topic if you are interested enough to want some additioanl clarity.
  
  ~~<a href="~~https://usspvlclovertch2.infor.com/viewtopic.php?p=22642#22642″ class=”bbcode_url”>https://usspvlclovertch2.infor.com/viewtopic.php?p=22642#22642
  
  Russ Ross
  RussRoss318@gmail.com
- October 13, 2011 at 2:47 pm #72886
  Steve Pringle
  Participant
  Hi Russ – thanks for the info. I’m meeting with our AIX admin today and will bring this up.
  
  You’re correct in that we are doing a scratch install to the new cluster. Our solution was to write a script that changed all occurrences of “LOCAL_IP {}” (the default) in all our NetConfigs to “LOCAL_IP clustername”. Then we ran into problems with linked threads and fileset-ftp threads. The fix, as noted, is to set LOCAL_IP back to null/empty. We’re testing all the other thread protocols to see if they work.
Author

Replies

Viewing 9 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.