Active:Active or Active:Passive?

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Active:Active or Active:Passive?

  • Creator
    Topic
  • #54662
    Tim Jipson
    Participant

      When choosing A:A or A:P what are the pros and cons of each? Does anybody have a strong preference of one over the other?

      Thanks,

      Tim

    Viewing 7 reply threads
    • Author
      Replies
      • #82461
        Rob Lindsey
        Participant

          This is going to depend on a number of factors.  One being your hardware and OS.  Other factors, such as cost and time to setup, design, and others.

          The way I understand it and I might be wrong, but in a true Active:Active setup you would need a “load balancer” to be in front of the CL systems and you would need the same exact setup of CL sites and threads on each system.  This way if one system goes off line the other CL system would take over.  Of course you will loose the First In First Out (FIFO) order of the data messages.  If your business depends on this then I do not believe you would be able to run an Active:Active.  Of course you could design an elaborate way of checking which message came in first and which one needs to go out first but that is very complicated.  It can be done but not sure it is worth the effort.

          Rob

        • #82462
          Tim Pancost
          Participant

            Hey, Tim,

            We’ve actually used both configurations.  Starting off, I’ll say that we’re on AIX, and use LPAR’s for our machines.  For many years we used an Active:Active set up, and several years ago moved to an Active:Passive setup.  The main reason at that time was actually driven by our hardware group.  They found, at the time, that it was cheaper and more efficient to have two active LPAR’s sized for just their load with passive LPAR’s that were basically just shells waiting to be utilized in the event of a failover.  In an Active:Active configuration, both machines needed to be sized to handle the full load of both machines simultaneously.  Granted, resources can now be dynamically assigned, but it makes the cluster much simpler to maintain when you don’t have to monitor both actives to ensure that they’d be able to handle the full load both machines.

            Another thing that can theoretically be done is having the machines in different areas of the data center(different subnets), when they don’t have to be tied together in an Active:Active configuration.  That way, in case of a partial disaster/disruption in the data center, both machines aren’t necessarily affected.

            We also found some application-level advantages to the Active:Passive configuration.  For Active:Active, you have to keep your sites in a separate filesystem, so that filesystem can be failed over, and use logical links in the root directory.  And unless you script it in your failover scripts(which we did not), you have to maintain those same links on the other node, even though they only really point to something in the event of a failover.  Which means that whenever you add a site on one machine, you have to remember to add the corresponding link on the other node, so it’s ready in case of a failover.

            Another advantage we found was the server.ini file and, more specifically, the environs value.  In an Active:Active configuration, you either have to keep those in sync, and thusly having sites listed in the IDE that don’t really exist on that machine, or have to have your failover scripts update the server.ini both when a machine fails over and when it fails back, which also includes bouncing the host server.  It also requires maintaining that environs list in the failover scripts every time a site is added/removed.  When using an Active:Passive configuration, this is completely removed from the scripts, making them much simpler.  The scripts pretty much then just need to stop all the processes and daemons.

            One other advantage we found was with the /home/hci directory, which is where we keep our cronjobs.  In an Active:Active configuration, we would need to keep those in sync between the machines, because if /home/hci were part of the failover, it would clobber whatever was on the other node when one failed over.  So we would need to keep the scripts on both machines.  Additionally, and more challenging, was the need to keep the crontab entries for the hci user in sync, and comment out what wasn’t needed during regular operation.  In our Active:Passive configuration, we just need to export the crontab to a file periodically(every day, for us), and in the event of a failover that is going to last long enough to necessitate running cronjobs, we just have to import that file into the crontab on the passive machine.

            I hope at least some of this made sense.  Of course, every company’s situation is different, but this is what worked for us.

            HTH

            TIM

            Tim Pancost

            Trinity Health

            Tim Pancost
            Trinity Health

          • #82463
            David Teh
            Participant

              Hi folks,

              Interested to find out how the Active: Active set up can be done, but still not be a pain to manage for operations folks. Currently on Solaris, but management seem keen to look at Linux….driven solely by costs factors.

              See if my understanding is correct.

              Let’s say we have server A (running root A and sites A1, A2) and server B (running root B and sites B1, B2).

              In peacetime, can the default ‘hci’ id still be used to run things in both servers? Or to avoid conflict when failover does happen, do I need to create a new id ‘hcia’ to run root A and ‘hcib’ to run root B?

              Also,

              – it seems the default $HCIROOT cannot be used since root A and root B need to be distinguished?

              – it also seems the IDE client on a laptop cannot used to access the sites but only the server-side IDE can be used via X-Windows?

              Seems too many negatives for the Active:Active setup but money-minded management who only see beautiful PowerPoint slides don’t bother about.

            • #82464
              Tim Pancost
              Participant

                I outlined our experiences with A:A vs A:P in my previous post, and those have not changed in these two years.  To do an A:A setup, you need to size each node to be able to handle the workload of both nodes at the same time, so you’re basically running each node at less than half its capability at any given time, just in case the other node should need to failover to it.  So, in essence, you’re needing the (somewhat) equivalent of four full-sized nodes’ resources to run two nodes’ worth of workload.  Fairly inefficient.  With an A:P configuration, you’re only sizing two “real” nodes, and then having two “virtual” nodes(that need very little resources allocated to them).  So maybe you’re talking about three full-sized nodes’ worth of resources, if that.  And, last I checked, three is definitely less than four.

                To directly address your specific questions:

                1. Why do roots A and B need to be distinguished, assuming they’re the same version?  If they’re not, well, there’s your distinguishing characteristic right there.  If they are, then there’s no reason you can’t have A1, A2, B1, and B2 all running under the same root.  Now, if you have master sites on these machines, and they’re not the same, yes, then you have an issue, as you can only have one master site per machine(last I knew).  This would negate the possibility of an A:A configuration(less some substantial scripting).

                2.  Again, don’t see how this would have any effect on running the IDE remotely.  The site list isn’t stored on your laptop.  It’s in the environs element within the server.ini file that resides on each machine.  When your laptop goes to bring up a site list, it’s talking to the host server, which gets the list from the server.ini.  This is what I was referring to in my earlier response about needing to have failover scripting in place to update the server.ini both when one of the nodes fails to the other AND when it fails back to its primary machine.  No need to use X-Windows. :-p

                HTH,

                TIM

                Tim Pancost
                Trinity Health

              • #82465
                David Teh
                Participant

                  Hi Tim,

                  Your earlier post is a definite keeper, which was immediately shared with some of my folks. 🙂

                  I had a conversation with someone else after my post, and think I may have made a wrong assumption.

                  In my current A/P

                • #82466
                  Tim Pancost
                  Participant

                    Frankly, the physical location of the disk is pretty much irrelevant.  For instance, NONE of our disk is “local”, it’s all on the SAN.  The real question is what filesystem(s) do you want to have failover to the other machine.  When we had an A:A configuration, the /qvdx filesystem(akin to your /quovadx, we just left out the vowels) did not failover, as it already exists on each node.  We had a separate filesystem that housed the sites to failover, and then we had logical links in the root directory to point the site name to the other filesystem.  For example, here’s a partial listing of our root directory that shows the links:

                    lrwxrwxrwx    1 hci      staff            11 Apr 30 01:38 athnahub -> /sites2/ath

                    lrwxrwxrwx    1 hci      staff            11 Apr 30 01:38 colohhub -> /sites2/col

                    lrwxrwxrwx    1 hci      staff            11 Apr 30 01:38 grrmihub -> /sites2/grr

                    lrwxrwxrwx    1 hci      staff            11 Apr 30 01:39 ie2mihub -> /sites2/ie2

                    So the /sites2 filesystem, which does failover, contains the actual site directories.  There is a corresponding filesystem, /sites1, on the other node.  This is what I was referring to in my initial response about the necessity of either scripting the links in your failover scripts, or maintaining all the links on both nodes, so they’re there in the event of a failover.

                    HTH,

                    TIM

                    Tim Pancost
                    Trinity Health

                  • #82467
                    David Teh
                    Participant

                      Hi Tim,

                      Got it!

                      Thanks a zillion!

                    • #82468
                      David Teh
                      Participant

                        Hi folks,

                        Currently testing out the A:A solution on 2 clustered Solaris servers.

                        As I am testing on live servers, I am currently leaving out the clustered pair on the live DC. Only testing on the pair at the secondary DC.

                        All servers are on Solaris.

                        DC1 has servers A and B, clustered with a cluster hostname CHN1.

                        DNS name VHN1 are used by clients (whether IDEs or application sending systems) to reach Site 1.

                        DC2 has servers C and D, clustered with a cluster hostname CHN2.

                        DNS name VHN2 are used by clients (whether IDEs or application sending systems) to reach Site 2.

                        For operators, they will have 2 IDEs open, one for Site 1 using VHN1 hostname and one for Site 2 using VHN2 hostname.

                        When Site 2 failover to Server A on which Site 1 was already operating on, the IDE for Site 2 should fail to contact the hostserver, right?

                        The “rmi_exported_server_port” value in the server.ini would be

                    Viewing 7 reply threads
                    • The forum ‘Cloverleaf’ is closed to new topics and replies.