This is a very timely and interesting topic as we have been asking ourselves some of these questions.
I hope to see other responses because what we have contemplated so far has just been amongst a tiny group in-house.
When setting up our new cloverleaf production HACMP servers 2 years ago using LPARs, I asked about doing wide area fail-over as a means of DR but that was rejected.
Now the topic of DR is becoming a political hot potato but the relectance for doing wide area fail-over here still persits.
Until recently, few of us have been able to take these conversations seriously.
One thing we have learned already, the DR solution has to be discussed with all departmental silos involved to come up with an enterprise DR solution.
Even though we have learned this we are still struggling to accomplish enterprise cooperation.
Another obvious point that some involved tended to ignore until it hit them in the face is that the band width to the DR location will need to be high enough to handle the load.
The way it looks for us now is that each server deemed crucial enough to justify DR will get a LPAR on a machine in the DR location.
So I will be getting a LPAR for Cloverleaf DR and SAN connections in the DR location but will not be using any wide area fail-over.
So my current thoughts about how to approach my situation will be to maintain a DR cloverleaf server on my DR LPAR that I keep mostly in sync with the production cloverleaf server by routing a copying of all prodcution inbound messages to the DR server.
All the outbound threads on the DR server could be in down time mode or add a tps_message_kill.
When a DR situation occurs, then we selectively modify the inbound listeners to receive messages from those senders that might be working if any; and modify outbound senders on the DR cloverleaf LPAR to stop killing or saving messages to dump files and start sending them to foreign systems that might be working if any.
Since a real DR will likely trash many if not all of the sending systems, this DR approach could turn out to be as fast as a wide area HA solution, especially since a decision has been made not to use wide area HA on any servers that I know of in the hospital.
I have observed a willingness to sacrifice costs in favor of taking several days to accommodate a DR situation.
It does seem to me at first glance that unless all or most of the enterprise systems are not using wide area fail-over, it might be that cloverleaf will be waiting day(s) to receive/make a DR connection from/to foreign systems anyway.
Russ Ross
RussRoss318@gmail.com