Cloverleaf Restart Steps After A Cluster Failure

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Cloverleaf Restart Steps After A Cluster Failure

  • Creator
    Topic
  • #54719
    Gina Borden
    Participant

    A couple of weeks ago we experienced a hardware failure in our network that caused my Cloverleaf cluster to lose connection.  The recovery to bring my Cloverleaf interfaces back online was about 4 hours.  Part of that was trying to determine why I could see the cluster, but couldn’t get the Cloverleaf Gui to start.  Once I stopped and started my cluster services on my AIX box, I was able to get in.  At that point, due to the hard crash, databases were corrupted and messages were hung, when had to be dropped to a file to try to minimize loss of data.

    My question today is, what steps could I have taken to minimize the downtime in this situation?

    Here is info about my server:

    26 – Sites

    121 – Processes

    AIX 6.1.0.0

    Cloverleaf 6.0.2.0

    Any help is greatly appreciated.

    Thanks,

    Gina

Viewing 3 reply threads
  • Author
    Replies
    • #82703
      Rob Lindsey
      Participant

      Due to some unusual circumstances I took out the “autostart” of the CL application and it is now a manual process for my team.  In the 5 years of being here at this company we have had 4 failovers due to hardware issues.  Every single time, the CL application started but with issues.  It is easier for us to go in and manually do the startup of each site after checking the databases.

      I do have an automated scripts that we run from the command line and this seems to help us.  Below is Part of the script.  There is a for loop before these lines below to get the sites from the system.

         setsite $site

         hcisitecleanup

         rm $HCISITEDIR/exec/monitorShmemFile

         rm $HCISITEDIR/exec/databases/vista.taf

         rm $HCISITEDIR/lock/*

         hcimsiutil -Z

         hcidbinit -if

         keybuild rlog

         keybuild elog

         dchain rlog

         dchain elog

      I know that it is not exactly what you wanted to read but the above does do a rebuild on the databases to try and fix the issue before having to do the process of writing out the msgs to a file and doing resends.

      Rob

    • #82704

      Gina, are you using the Cloverleaf HA Scripts? These scripts are designed to cleanly bring up/down Cloverleaf in the event of a scheduled downtime or a crash.

      -- Max Drown (Infor)

    • #82705
      bill whatley
      Participant

      I can echo Rob’s sentiments, although we still let the HA startup scripts start the engine and the threads.  The one non-admin initiated failover in the last 5+ years here didn’t work because essential resources became unavailable.

    • #82706

      That HA system has dramatically improved over the years at an operating system level, SAN disks, the Cloverleaf Raima database, and the HA scripts. I’d recommend taking a look at the latest technology available.

      -- Max Drown (Infor)

Viewing 3 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,292
Replies
34,432
Topic Tags
286
Empty Topic Tags
10