Do any of you have methods – other than “be really careful” – that ensure that a Tcl or translation error in one route will not result in failure to deliver messages in other routes?
Not completely and I’ve also discovered that a bad problem in a new integration route can halt routing of messages to other interfaces that have the same source thread.
One indirect approach to reduce the potential impact is by breaking thread grouping into smaller groups.
Lets say instead of one receiving thread going to 16 threads, you could have 2 receiving threads sending to 8 threads each.
8 threads is the most we allow any source thread to send too before we break things up into smaller groups.
I’ve heard rumor a at least one location that goes to the extreme of having one inbound thead and one outbound thread per cloverleaf site.
When I first heard of this it sounded extreme but does have plausible merit when you consider the impact of any single interface on another would be minimal not to mention it would maximize overall throughput of the cloverleaf engine.
Not exactly a great answer to the problem.
Personally I would prefer an error on a single route only impact that particular route and not the others.
Another method in addition to being very careful that has helped us catch such hidden problems is we have a mock down time model when combined with our prodcution versioning method allows us to mock golive and play message through our production interfaces we plan to golive with as if they were live without impacting the currently running live prodcution interfaces they will be replacing.
Russ Ross
RussRoss318@gmail.com