Active:Active or Active:Passive?

This topic has 8 replies, 4 voices, and was last updated 7 years, 2 months ago by David Teh.

Creator

Topic
April 27, 2015 at 6:28 pm #54662
Tim Jipson
Participant
When choosing A:A or A:P what are the pros and cons of each? Does anybody have a strong preference of one over the other?

Thanks,

Tim
Creator

Topic

Viewing 7 reply threads

Author

Replies
- April 28, 2015 at 1:51 pm #82461
  Rob Lindsey
  Participant
  This is going to depend on a number of factors. One being your hardware and OS. Other factors, such as cost and time to setup, design, and others.
  
  The way I understand it and I might be wrong, but in a true Active:Active setup you would need a “load balancer” to be in front of the CL systems and you would need the same exact setup of CL sites and threads on each system. This way if one system goes off line the other CL system would take over. Of course you will loose the First In First Out (FIFO) order of the data messages. If your business depends on this then I do not believe you would be able to run an Active:Active. Of course you could design an elaborate way of checking which message came in first and which one needs to go out first but that is very complicated. It can be done but not sure it is worth the effort.
  
  Rob
- April 28, 2015 at 2:01 pm #82462
  Tim Pancost
  Participant
  Hey, Tim,
  
  We’ve actually used both configurations. Starting off, I’ll say that we’re on AIX, and use LPAR’s for our machines. For many years we used an Active:Active set up, and several years ago moved to an Active:Passive setup. The main reason at that time was actually driven by our hardware group. They found, at the time, that it was cheaper and more efficient to have two active LPAR’s sized for just their load with passive LPAR’s that were basically just shells waiting to be utilized in the event of a failover. In an Active:Active configuration, both machines needed to be sized to handle the full load of both machines simultaneously. Granted, resources can now be dynamically assigned, but it makes the cluster much simpler to maintain when you don’t have to monitor both actives to ensure that they’d be able to handle the full load both machines.
  
  Another thing that can theoretically be done is having the machines in different areas of the data center(different subnets), when they don’t have to be tied together in an Active:Active configuration. That way, in case of a partial disaster/disruption in the data center, both machines aren’t necessarily affected.
  
  We also found some application-level advantages to the Active:Passive configuration. For Active:Active, you have to keep your sites in a separate filesystem, so that filesystem can be failed over, and use logical links in the root directory. And unless you script it in your failover scripts(which we did not), you have to maintain those same links on the other node, even though they only really point to something in the event of a failover. Which means that whenever you add a site on one machine, you have to remember to add the corresponding link on the other node, so it’s ready in case of a failover.
  
  Another advantage we found was the server.ini file and, more specifically, the environs value. In an Active:Active configuration, you either have to keep those in sync, and thusly having sites listed in the IDE that don’t really exist on that machine, or have to have your failover scripts update the server.ini both when a machine fails over and when it fails back, which also includes bouncing the host server. It also requires maintaining that environs list in the failover scripts every time a site is added/removed. When using an Active:Passive configuration, this is completely removed from the scripts, making them much simpler. The scripts pretty much then just need to stop all the processes and daemons.
  
  One other advantage we found was with the /home/hci directory, which is where we keep our cronjobs. In an Active:Active configuration, we would need to keep those in sync between the machines, because if /home/hci were part of the failover, it would clobber whatever was on the other node when one failed over. So we would need to keep the scripts on both machines. Additionally, and more challenging, was the need to keep the crontab entries for the hci user in sync, and comment out what wasn’t needed during regular operation. In our Active:Passive configuration, we just need to export the crontab to a file periodically(every day, for us), and in the event of a failover that is going to last long enough to necessitate running cronjobs, we just have to import that file into the crontab on the passive machine.
  
  I hope at least some of this made sense. Of course, every company’s situation is different, but this is what worked for us.
  
  HTH
  
  TIM
  
  Tim Pancost
  
  Trinity Health
  
  Tim Pancost
  Trinity Health
- June 6, 2017 at 2:55 am #82463
  David Teh
  Participant
  Hi folks,
  
  Interested to find out how the Active: Active set up can be done, but still not be a pain to manage for operations folks. Currently on Solaris, but management seem keen to look at Linux….driven solely by costs factors.
  
  See if my understanding is correct.
  
  Let’s say we have server A (running root A and sites A1, A2) and server B (running root B and sites B1, B2).
  
  In peacetime, can the default ‘hci’ id still be used to run things in both servers? Or to avoid conflict when failover does happen, do I need to create a new id ‘hcia’ to run root A and ‘hcib’ to run root B?
  
  Also,
  
  – it seems the default $HCIROOT cannot be used since root A and root B need to be distinguished?
  
  – it also seems the IDE client on a laptop cannot used to access the sites but only the server-side IDE can be used via X-Windows?
  
  Seems too many negatives for the Active:Active setup but money-minded management who only see beautiful PowerPoint slides don’t bother about.
- June 6, 2017 at 6:20 pm #82464
  Tim Pancost
  Participant
  I outlined our experiences with A:A vs A:P in my previous post, and those have not changed in these two years. To do an A:A setup, you need to size each node to be able to handle the workload of both nodes at the same time, so you’re basically running each node at less than half its capability at any given time, just in case the other node should need to failover to it. So, in essence, you’re needing the (somewhat) equivalent of four full-sized nodes’ resources to run two nodes’ worth of workload. Fairly inefficient. With an A:P configuration, you’re only sizing two “real” nodes, and then having two “virtual” nodes(that need very little resources allocated to them). So maybe you’re talking about three full-sized nodes’ worth of resources, if that. And, last I checked, three is definitely less than four.
  
  To directly address your specific questions:
  
  1. Why do roots A and B need to be distinguished, assuming they’re the same version? If they’re not, well, there’s your distinguishing characteristic right there. If they are, then there’s no reason you can’t have A1, A2, B1, and B2 all running under the same root. Now, if you have master sites on these machines, and they’re not the same, yes, then you have an issue, as you can only have one master site per machine(last I knew). This would negate the possibility of an A:A configuration(less some substantial scripting).
  
  2. Again, don’t see how this would have any effect on running the IDE remotely. The site list isn’t stored on your laptop. It’s in the environs element within the server.ini file that resides on each machine. When your laptop goes to bring up a site list, it’s talking to the host server, which gets the list from the server.ini. This is what I was referring to in my earlier response about needing to have failover scripting in place to update the server.ini both when one of the nodes fails to the other AND when it fails back to its primary machine. No need to use X-Windows. :-p
  
  HTH,
  
  TIM
  
  Tim Pancost
  Trinity Health
- June 6, 2017 at 11:27 pm #82465
  David Teh
  Participant
  Hi Tim,
  
  Your earlier post is a definite keeper, which was immediately shared with some of my folks. 🙂
  
  I had a conversation with someone else after my post, and think I may have made a wrong assumption.
  
  In my current A/P
- June 7, 2017 at 6:27 pm #82466
  Tim Pancost
  Participant
  Frankly, the physical location of the disk is pretty much irrelevant. For instance, NONE of our disk is “local”, it’s all on the SAN. The real question is what filesystem(s) do you want to have failover to the other machine. When we had an A:A configuration, the /qvdx filesystem(akin to your /quovadx, we just left out the vowels) did not failover, as it already exists on each node. We had a separate filesystem that housed the sites to failover, and then we had logical links in the root directory to point the site name to the other filesystem. For example, here’s a partial listing of our root directory that shows the links:
  
  lrwxrwxrwx 1 hci staff 11 Apr 30 01:38 athnahub -> /sites2/ath
  
  lrwxrwxrwx 1 hci staff 11 Apr 30 01:38 colohhub -> /sites2/col
  
  lrwxrwxrwx 1 hci staff 11 Apr 30 01:38 grrmihub -> /sites2/grr
  
  lrwxrwxrwx 1 hci staff 11 Apr 30 01:39 ie2mihub -> /sites2/ie2
  
  So the /sites2 filesystem, which does failover, contains the actual site directories. There is a corresponding filesystem, /sites1, on the other node. This is what I was referring to in my initial response about the necessity of either scripting the links in your failover scripts, or maintaining all the links on both nodes, so they’re there in the event of a failover.
  
  HTH,
  
  TIM
  
  Tim Pancost
  Trinity Health
- June 7, 2017 at 11:06 pm #82467
  David Teh
  Participant
  Hi Tim,
  
  Got it!
  
  Thanks a zillion!
- May 7, 2018 at 10:12 am #82468
  David Teh
  Participant
  Hi folks,
  
  Currently testing out the A:A solution on 2 clustered Solaris servers.
  
  As I am testing on live servers, I am currently leaving out the clustered pair on the live DC. Only testing on the pair at the secondary DC.
  
  All servers are on Solaris.
  
  DC1 has servers A and B, clustered with a cluster hostname CHN1.
  
  DNS name VHN1 are used by clients (whether IDEs or application sending systems) to reach Site 1.
  
  DC2 has servers C and D, clustered with a cluster hostname CHN2.
  
  DNS name VHN2 are used by clients (whether IDEs or application sending systems) to reach Site 2.
  
  For operators, they will have 2 IDEs open, one for Site 1 using VHN1 hostname and one for Site 2 using VHN2 hostname.
  
  When Site 2 failover to Server A on which Site 1 was already operating on, the IDE for Site 2 should fail to contact the hostserver, right?
  
  The “rmi_exported_server_port” value in the server.ini would be
Author

Replies

Viewing 7 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.