› Clovertech Forums › Read Only Archives › Cloverleaf › General › High Availability
We have a tested home-grown solution to rehost our production site onto a backup platform, without a reboot.
We are also looking at HA/XD, but expect that applevel support is required to do more for us than we are doing for ourselves now.
We have thoroughly tested our failover (rehosting) mechanism on our test site, and are planning to go live on a new p615 running AIX 5.2L and Platform 5.3 (zero rev?)
They say if you can get cloverleaf setup under a different user /home/bob for example, then it can be handled by the cluster to failover onto different nodes. The biggest problem seems to be fixing the perl scripts which have the hci user hardcoded.
I can try to answer more questions on this subject, but I think I’ve explained most of what I know about it.
“Back in the day”, IBM Global Services typically would come in and do all the requirements analysis, etc., then put up a canned two node config.
Supposedly, with HA/XD, there is an “express” configuration option for the common two-node solution. (Think: no consulting fees for installation and setup.)
But for us, it *could* be the case that we would eventually have three machines, with intermachine “health check” heartbeats.
With the appropriate failover capability in the app, I would think three machines would be the minimum that could reasonably detect and lockout a failing node automatically.
Here is a link to the IBM Announcement letter for HA/XD
205085 IBM HACMP/XD expands its business continuity solution with
improved, simpler geographic-distance data mirroring and
disaster recovery (14.5KB)
Jeff
I am using HCI cluster scripts to monitor, stop, start and clean the application.
It works a treat.
Dave
Rob Abbott
Cloverleaf Emeritus
Thanks
Jeff
We have implemented them on AIX (HACMP), Solaris (veritas), HP/UX (MC/Service Guard), and Windows 2000 (windows clustering).
We’re also working on Linux clustering but have not implemented in a live environment as of yet.
They make Cloverleaf operate in an HA environment. They rely on the particular HA solution to function.
We also provide full HA implementations for AIX/HACMP.
As I say, if you need more detailed information please contact your Customer Relations Manager (CRM).
Thanks!
Rob Abbott
Cloverleaf Emeritus
Would those of you who are currently running this model of HA on AIX be willing to share:
A) The time it takes to failover?
B) Whether or not your standard reboot time increased (if so by how much)?
C) Did your test to production code migration become more complex (if so how)?
Thanks in advance.
-Ian Smith
Have recently tested HA failover to our test box and had alot of issues owing to recovery database corruption which prevented the processes from starting on the failover box. QDX has not delivered us an answer as to how to get around recovery database corruptino problems. We had to reinit the recovery database in order to get some of the processes back up.
ALso..many of the downstream connections (TCP/IP) neede rebooted owing to the abrupt disconnect they experienced.
all in all..the processes attempt to come up within 10 minutes…some did…but the overall mess took us about 2-3 hours to get back to semi-normal state and we lost data
So yeah..we had an environment to work with
but no…is aint clean by any stretch of the imagination
The setup HACMP was setup by Quovadx.
Little to no documentation delivered other than the config files themselves.
Bob
1. Expense of the software and having a second server that basically does nothing.
2. It complicates the system design.
3. Have to have someone who knows HACMP.
4. Have to run regular tests – takes planning and downtime.
5. Only failed over once in four years on the old system.
Because the IBM pServers are so reliable these days, it seem that the risk of hardware failure is minimal. We made the system as fault tolerant as possible – dual power supplies, dual cooling fans, dual ethernet adapters, dual fibre channel adapters to our SAN (we do not use internal disk in production), and so on. We have been on this configuration for 1.5 years and have not had a hardware filaure.