We have alerts of type ‘pstat’, triggered if a thread protocol status is not ‘up’ for 60 seconds, and also of type ‘ipque’ and ‘opque’, triggered if the queue depth is more than zero for 15 minutes.
These seem to work very well, except if a process panics; all its threads go down but the alert is not triggered. Messages also build up in the recovery database destined for the affected threads, but no alerts are triggered for this either.