This is a very timely and interesting topic as we have been asking ourselves some of these questions.
I hope to see other responses because what we have contemplated so far has just been amongst a tiny group in-house.
When setting up our new cloverleaf production HACMP servers 2 years ago using LPARs, I asked about doing wide area fail-over as a means of DR but that was rejected.
Now the topic of DR is becoming a political hot potato but the relectance for doing wide area fail-over here still persits.
Until recently, few of us have been able to take these conversations seriously.
Even though we have learned this we are still struggling to accomplish enterprise cooperation.
The way it looks for us now is that each server deemed crucial enough to justify DR will get a LPAR on a machine in the DR location.
So I will be getting a LPAR for Cloverleaf DR and SAN connections in the DR location but will not be using any wide area fail-over.
So my current thoughts about how to approach my situation will be to maintain a DR cloverleaf server on my DR LPAR that I keep mostly in sync with the production cloverleaf server by routing a copying of all prodcution inbound messages to the DR server.
All the outbound threads on the DR server could be in down time mode or add a tps_message_kill.
When a DR situation occurs, then we selectively modify the inbound listeners to receive messages from those senders that might be working if any; and modify outbound senders on the DR cloverleaf LPAR to stop killing or saving messages to dump files and start sending them to foreign systems that might be working if any.
Since a real DR will likely trash many if not all of the sending systems, this DR approach could turn out to be as fast as a wide area HA solution, especially since a decision has been made not to use wide area HA on any servers that I know of in the hospital.
I have observed a willingness to sacrifice costs in favor of taking several days to accommodate a DR situation.
Russ Ross
RussRoss318@gmail.com