I don’t think a number can be set that fits everyone – or even close.
It not only depends on the volume of messages but also the number of destinations, the power of the Cloverleaf(R) hardware platform, and how efficiently receiving systems accept messages.
For example, if you have one receiving system that averages 30 seconds between receipt of message and acknowledgment and you have peak arrival rate of 1 message every .33 seconds, your pending count (and the number of Recovery DB messages) for that one thread alone in one 30 second wait for reply will increase by about 90 messages. Depending on how long the peak period lasts, you can see that will grow very quickly.
If you overall arrival rate exceeds 1 message every 30 seconds, then that thread that has gotten behind will probably stay behind for a longer time than just the peak demand period – maybe all day!!
What if there are other threads which cannot keep up with the arrival rate? Obviously additional messages will reside in the Recovery DB.
I think you need to determine your peak demand period (typically measured in either 1 hour or 15 minute time frame) for ALL inbound threads combined as well as for each inbound thread individually. You can do this quite simply with the SMAT .idx files. If you can narrow down one period on one day each week to watch that would be best.
And – you need to determine the performance of your receiving threads. Sometimes casual observation during the determined peak demand period is accurate enough. However, again using the SMAT files, you can analyze the average delay between sending a message and receiving the appropriate acknowledgment. You can also determine if there are consistent resends occurring for each thread (that might mean you need to tweak the wait times – you may even determine you can reduce the wait times that is your choice).
A quick way to tell if resends are likely occurring is to zero out the stats for the thread in question just as a peak period is about to begin and simply check the Thread Status in the NetMonitor display as the peak period ends. If there are more messages going out than coming in – you likely have that many resends. It is then valuable to find out if that resend activity was evenly spread during the time period or concentrated. Again the SMAT files (in and out) can assist.
Armed with the above intelligence, you should be able to estimate a general high water mark for the Recovery Database. Obviously if that mark is exceeded during your peak period it does not necessarily mean anything is amiss but warrants observation.
If the high water mark is exceeded in any other period – either you now have a new candidate for the peak demand period, or the arrival rate peak observed is an anomaly, or there is a problem with one or more of the threads. In any case exceeding the high water mark outside of the determined peak demand period probably warrants review.
Of course, if one or more of the receiving system threads are down during any period, the number of messages in the Recovery DB will grow. During the peak period they will grow really fast.
If you have your integrations split into multiple sites, you have one Recovery DB per site. That may or may not be an option should you find you have an issue and canot resolve any of the other variables.
In order to set the appropriate performance and capacity numbers, you need to spend some time gathering intelligence. The determination of the workload characteristics (peak demand period, individual thread efficiency, etc.) needs to be repeated periodically. This is because the pattern of demand can change due to the hospital business changing, or adding additional inbound threads, or increasing the number of Message/Event Types being sent, adding additional receiving threads (some which may be poor performers), or other considerations).
A generally accepted cycle for validating workload characteristics is every 6 months. That is only a reference point, constraints may cause you to do this more or less often.