Nuts and bolts question about threads with Message Routes

This topic has 10 replies, 6 voices, and was last updated 13 years, 3 months ago by James Cobane.

Creator

Topic
April 7, 2012 at 10:49 pm #53039
Jon Blanchard
Participant
I don’t believe this question was ever asked or answered in Level 1 or 2 training (in June 2007), but does an inbound thread with multiple message routes spawn subprocesses for each route that allows simultaneous processing of the current message, or is the message processed through each route in line, beginning with the first?

My guess is in line.

Migration Consultant
Coffee Regional Medical Center
Eastern Time Zone
Creator

Topic

Viewing 9 reply threads

Author

Replies
- April 9, 2012 at 12:39 pm #76331
  bill bearden
  Participant
  Jon,
  
  I’m not sure about 5.4. I was just in Level 2 for 5.8.5.
  
  The way it was explained was that a copy of the message is made for each route. Assuming there are xlates on each route, each copy is run through its xlate. If they all pass, the original is done away with and the copies are sent on to the next thread. But if one of the copies fails in its xlate, all the copies are discarded and the original is errored.
  
  It seemed from the explanation that there is only one OS process/thread for all the routes/xlates for the copies from a particular source thread. But there was a side discussion about mutli-threading (in more recent versions of CL) that I didn’t understand. So I don’t know if it was related to your question.
  
  Bill
- April 9, 2012 at 1:39 pm #76332
  Jeff Dinsmore
  Participant
  That’s the way I understand it (Bill’s explanation).
  
  This is one of my least favorite behaviors of Cloverleaf and other engines.
  
  Depending on how your interfaces are designed, you can get “partial” delivery of a given inbound/source message. The first failure in a string of routes sends the message to the error database and bails – effectively skipping the remaining routes.
  
  This raises a few questions – for me at least…
  
  1) What’s the best method to recover from a single-message failure like this? We need to resend the source message only to those routes that were skipped.
  
  2) How can we design our interfaces to avoid this “feature”?
  
  3) What about a consistent failure for a given route? I don’t know about the rest of you, but I’ve made “trivial” changes without adequate testing (Gasp!). This prevents messages from getting to all the skipped destinations for the period of time between when the change is made and when we discover it. Or, this could also be caused by some unexpected upstream messaging change (much better for my reputation) This can create a mess that’s not trivial to fix.
  
  I’ll go first…
  
  My primary method to address these three is to do all message manipulation on the outbound side of any given route. This way, any xlate failure will only affect one destination – and recovery is easier because the rest of the unaffected interfaces continue to process messages.
  
  For example – ADT and orders to destinations A, B and C:
  
  ADT in
  
  > route/xlate -> ADT out A
  
  route/xlate -> ADT out B
  
  route/xlate -> ADT out C
  
  ORM in
  
  >route/xlate -> ORM out A
  
  route/xlate -> ORM out B
  
  route/xlate -> ORM out C
  
  …becomes:
  
  ADT in
  
  >raw route -> outbound pool A
  
  raw route -> outbound pool B
  
  raw route -> outbound pool C
  
  ORM in
  
  >raw route -> outbound pool A
  
  raw route -> outbound pool B
  
  raw route -> outbound pool C
  
  outbound pool A -> route/xlate -> ADT out A
  
  route/xlate -> ORM out A
  
  outbound pool B -> route/xlate -> ADT out B
  
  route/xlate -> ORM out B
  
  outbound pool C -> route/xlate -> ADT out C
  
  route/xlate -> ORM out C
  
  With this method, failures in xlates/scripts only affect a single destination. And, recovery is easy – just resend all failed messages to the outbound pool of the failed destination and you’re all set (after fixing the problem, of course).
  
  Ladies/Gentlemen… I’m sure mine is not the only approach. What are your solutions? Feel free to dissect my methods and offer opinions.
  
  Thanks,
  
  Jeff.
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- April 9, 2012 at 2:01 pm #76333
  bill bearden
  Participant
  Maybe I am misunderstanding. Or perhaps we aren’t talking about the same thing.
  
  What I thought I heard in class is that all copies of the message are discarded if any fails. That is, if there are 3 routes, none of the 3 messages will be sent on if even the last xlate fails. My understanding is that it is all or nothing.
  
  Are you saying that some copies can go forward and some can fail?
- April 9, 2012 at 7:33 pm #76334
  Jeff Dinsmore
  Participant
  That’s not the case. You can get partial success for a given route with outbound messages for some of the destinations and not for others.
  
  I decided to do some testing – to be sure that my understanding lines up with reality. And, I was Mostly Right (is that like Mostly Dead?).
  
  Anyway, I’ve attached the results of my analysis.
  
  To you CloverTech-ers with deep experience – please have a look to see if you can explain my findings.
  
  Perhaps I’m missing something obvious, but it’s left me a bit confused/concerned.
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- April 9, 2012 at 8:35 pm #76335
  James Cobane
  Participant
  Jeff,
  
  I believe that your issue with Config 1 destinations A & B is the route shows TrxID of “ADT_08” and not ADT_A08.
  
  Also, I believe there may be a difference in the handling of errors based on a pre/post-xlate tps proc on the route versus an actual error encountered during Xlate (i.e. a bad xlate action or bad xlt tcl fragment in the Xlate). I think that if the actual Xlate fails, no messages to any of the outbounds would go.
  
  Jim Cobane
  
  Henry Ford Health
- April 9, 2012 at 9:36 pm #76336
  Jeff Dinsmore
  Participant
  Ahhh… the old typographical error… a second set of eyes always helps, thanks.
  
  I fixed (changed the ADT_08 to ADT_A08) and retested. Config 1 results are now consistent with those for Configs 2 and 3.
  
  I’ve attached Rev 2 of my observations.
  
  To your (Jim’s) suggestion that an actual xlate error might behave differently: I built an Xlate that was sure to fail (copies blah->blahblah), applied it to destination “D” and it behaves the same way as a raw route with a Tcl proc for translation – it successfully sends to destination C, but not to D or E.
  
  There’s an apparent difference between the dot-asterisk wildcard and the others. Anyone know why that might be?
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- April 12, 2012 at 8:08 pm #76337
  Peter Heggie
  Participant
  I like the ‘pool’ concept, for your reason of moving the Xlate to the outbound thread and avoiding the killing of all message copies if there is an error in one Xlate. However, perhaps the designers of Cloverleaf made a choice to err on the side of caution. Maybe an Xlate failure is caused by a ‘bad’ message; if some copies make it through because some Xlates are more forgiving, is that ok? Maybe it is better to kill them all and let the admin sort them out. 🙂
  
  But an admin might implement better Xlates and filters and so we trust the routing and processing to work correctly, and only kill the one bad message. And with 5.8.4, we can resend to specific destinations, so there is less concern about sending duplicate messages.
  
  Unfortunately our licensing is thread based so we can’t just implement a design which adds another third count of threads. And a majority of our destinations only have one input (from our HIS), so we don’t get much of a benefit from consolidating our outputs to any one destination.
  
  But your design:
  
  ADT in
  
  >raw route -> outbound pool A -> route/xlate -> ADT out A
  
  is efficient and avoids the basic problem. Also possibly the newer feature of multi-threaded translation may be ‘safer’ to use by making it easier to set it on ‘Per Source’ or ‘Unrestricted Source List’.
  
  It also moves the slower Xlates and filtering operations from the inbound process to the outbound process – for us, the inbound process is from our HIS which means there are 40+ destinations. By moving the processing to the outbound side, it speeds up the inbound as fast as possible and does not penalize the faster processes by making them wait for the slower processes. But does this impact any sequencing and order-of-message processing?
  
  Peter Heggie
  PeterHeggie@crouse.org
- April 12, 2012 at 8:25 pm #76338
  Jim Kosloskey
  Participant
  Peter,
  
  I have used that architecture at my previous gig and here (nearly 17 years) and we noticed no deleterious effects vis-a-vis message order, etc.
  
  We did notice all of the benefits you describe and more.
  
  My personal theory is everyone will eventually be multi-site it is just a qusetion of when.
  
  In 5.8 the addition of the public thread and cross-site routing holds promise for those who have been affected by the ‘per thread’ pricing.
  
  Even with that pricing (and I don’t know the details since we do not have that limitation) I would submit in the long run the benefits of multi-site architecture would effectively pay for the additonal thread count.
  
  email: jim.kosloskey@jim-kosloskey.com 30+ years Cloverleaf, 60 years IT – old fart.
- April 13, 2012 at 2:29 pm #76339
  Jeff Dinsmore
  Participant
  Peter,
  
  The basic problem my analysis confirmed is that the delivery of messages in a route with multiple destinations can be partial if there’s an xlate failure.
  
  If it were all or nothing all the time, that would be fine, but it processes the list of destinations for a particular route and, if it encounters an error in the xlate, it stops and does not process any of the remaining destinations. So, you can get delivery to A and B, C fails, then it never gets to D and E.
  
  And, to add an interesting twist, an xlate failure in the “ADT_.*” route in my test delivered no messages, but failures in the others resulted in partial delivery. That lack of consistency/predictability is what concerns me.
  
  Don’t get me wrong, this doesn’t happen often, but if it does I want to be sure it has as little impact as possible. The mantra is “no message left behind”…
  
  To your concern that the pool approach increases your thread count:
  
  Thread count increased for us to be sure, but I was trying to solve three performance/reliability/usability issues.
  
  1) We had one large site that was too big for it’s britches and needed to be segmented for performance reasons – we had no choice. So, the additional inter-site thread count overhead (in our 5.6 version anyway) was a given.
  
  2) I wanted to eliminate the possibility that an xlate failure would result in partial delivery of any given message – except to the destination for which it failed.
  
  3) I wanted to make it easy to find/modify messages and resend them (inbound pre-TPS) to a single destination without having to worry about duplicating messages to other systems.
  
  Breaking the mega-site into several smaller sites solved #1.
  
  The pool architecture solves #2. It guarantees that a translate error only affects one destination.
  
  The pool approach also solves #3. If I want to resend to a single destination, I just resubmit to its outbound pool. If I want to resend to all destinations, I just resend to the original inbound thread.
  
  We’re not thread-count bound, so I had the freedom of not worrying about that, but did focus on minimizing the total thread count as much as practical, while still solving these issues.
  
  I initially expected a larger increase in the thread count, but by using multi-server for the pools, I was able to substantially reduce the total increase.
  
  I use a single multi-server pool for any given outbound system – let’s say ED, for example. Any inbound message that’s going to ED (ADT, Lab, Rad, etc) goes through this pool and is routed with a TRXID UPOC to the required oubound thread.
  
  Jeff Dinsmore
  Chesapeake Regional Healthcare
- April 13, 2012 at 3:20 pm #76340
  James Cobane
  Participant
  I would suggest opening a case with Lawson Support to confirm/deny that the observed behavior is “as designed”.
  
  Jim Cobane
  
  Henry Ford Health
Author

Replies

Viewing 9 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.