› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Alerts Wish List
I have been tasked to work with R&D to come up with some ideas of how to enhance the Cloverleaf alerts. I have some of my own ideas as those of you that have used my Alerts Package can attest to.
Now I would like your ideas. I can’t promise that we will incorporate all but I can promise that we will consider all.
Here is some of what I am looking at thus far:
Alerts trigger on transitions not absolutes. e.g. Trigger if thread transistions from UP to OPENING not just the fact that the thread is OPENING. It is normal when starting to go from DOWN to OPENING
Change disk values to absolutes instead of percentage. If you have used 80% of a Terra-Byte disk you still have a lot left. No need to define Cloverleaf file system in the alert. The software should be smart enough to figure that out
Configure e-mail and other means of notification from Alert GUI
Separate the Alert and MonitorD daemon so one does not hose the other, You old timers remember it used to be that way
Alert on number of messages in Error Database or if MonitorD/Lock Manager is down
Easier to configure alerts
An Alerts log that will indicate which alerts have triggered, which have been fixed, which are still pending, fixed by who, etc.
I have more but this is all I will list for now. Let me hear from you. If you like you can respond here to let others know what you are thinking or send me e-mail:
I need your inputs within the next week or so.
Charlie
For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
– Glenn
1) flowStatus alert. If an outbound queue has a queue depth, I want to know that the messages are flowing or not. I have this working to a certain extent with a modification to your protoStatus alert. It is not perfect, but is more on the side of it will tell you when messages are not flowing. The trouble I have is that I send these alerts to HP Openview and they want a correction alert. If the queue drains quickly, the correction alert will never be triggered. I also bounce the thread if I have alerted 5 times that messages are not flowing…. This is arbitrary and still in beta phase. I compare current time subtract last read time to a specified wait time. This allow me to individualize the wait time for those annoying threads that ACKs can take a long time to be returned by an application. Another scenario is not receiving ACKs from destination system by design(not mine). Trying to have this one corrected by vendor… I send the NOT FLOWING alert as often as it is triggered if it meets the criteria, Send the openview correction as often as triggered, but restrict email to 1 at the most so as to not bury email. This alert definitely helps.
2) Can an alert fire off muliple times without resetting all alert counters? Say by touching default.alert.
3) I think I have seen something in the works where you will alert on Process being down and skip the individual thread alerts…
4) A mechanism to turn off all alerts or maybe some during maintenance.
5) …
Charlie, how about an alert to monitor the inbound queue depth?
We have situations where messages get stuck on the inbound side due to our current design of (say) too many threads in a cluster with the xlate cmd thread overloaded (We now know that we can tweak the process to balance the percentages of time spent in the xlt cmd thread for a process in the NetConfig). This would include message states 1 thru 7. Also: as an adjunct to the alerts, repair the pxqd (pending) count logic in the msi: discovered that is broken when trying to alert on a connection via the msi and interpreting this count.
Also: capability to do outbound queue depth alerts on https connections when using the UPOC protocol driver option.
Thanks for spearheading this effort!
This capability exists within the current Alert configurator; you just need to configure two alerts (one for last read >= 10 min, and another for last read >= 20 min, and set-up the respective schedule for each in the schedule window.
Jim Cobane
Henry Ford Health
Glenn Wrote:
Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).
For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
– Glenn
One thing I would like to see would be an option to configure additional text to display when the alert triggers; i.e. information that could be used to tell Operators what to do when that alert triggers, such as “Call the Vendor at (555)333-4455 and talk to Joe Schmoe”.
Jim Cobane
Henry Ford Health
As Jim says that capabilty does exist but, hopefully, we can make it easier
Keith:
1. I’m not sure I really understand point 1. Are you saying that you want to alert if the queue is draining but too slowly?
2. This is a PRIMARY goal plus escalion say to some mgmt level
3 and 4. This is already part of Alerts package and will certainly be part of my recommendation
Bob:
Not sure I understand. We already have a queue depth alert.
Yes, I have recommend that we *NEVER reset send or reecive times, just the counts
When using UPoC, part of the deal is to provide your own. It is a simple matter to set up an OB UPoC as both read and write. During read (timer) check how long it has been
Good stuff guys, keep it coming
Right now %A gives you the display text we need a way just to send the thread and process name. I have scripts where I parse them out of the display text being able to just pass the names would make life for those easier.
The second would be a way to tell the alerts to keep triggering every x number of minutes until the condition is cleared then a way to send an all clear.
The third based on the previous would be ways to say yes I know the remote system is down now leave me alone. A way too turn off alerts that we just configured to keep calling us until they are fixed.
Thanks for the input
1: ALerts will have a name you will be able to access with %N. That should help.
2 and 3 are already on my list
Don’t be too hard on Viken at the Level 3 class next week 😛
It would be really helpful if “last message received time” could persist through a thread bounce. Right now, if you bounce an inactive inbound thread to try to reestablish a connection, you lose the last receive time and get “never” instead.
- Mark Thompson
HealthPartners
I have been trying to make that happen. I could not agree more
Another nice to have alert is >N messages in the recovery or error database. We use cron jobs for this — it would be great to make it native.
- Mark Thompson
HealthPartners
It is on the list and will be done
I’d also like to be able to reference the site, process and thread names in the text of the message as variables (for example, @sitename, @processname or @threadname). I have to hard code these in the message now.
We’d also like to see better integration with other monitoring tools. We use HP Openview here, and I have to call scripts with parameters to get Cloverleaf alerts to show up in Openview.
Provide some ability to preview the alerts. As it stands now, I code it and don’t know that I got the syntax wrong until the operators tell me that the condition occurred, but they did not recieve an alert.
Also, whatever you do should be documented, with good examples.
The command entry for an alert should be more than a single line of text that allows me to see thirty characters. Being able to see only a portion of the command makes it difficult to create or debug.
See the screenshot example.
Thee will provisions for you to define your own A new alert called “Tcl” will be used and the message it returns will be the alert message. You will be able to logically AND or OR this with other alerts to create some pretty sophiticated alerts. You will alos be able to name alerts and you will be passed the name of the alert.
As for more than 30 characters perhaps we need and edit option for exec like we have now for Tcl?
Not sure I understand all of that 😀 But you will able to define any alert via Tcl
Charlie
I did send you an email but thought maybe I could post here aws well as an attachment.
For example, sometimes we get a flood of messages from an inbound system, far more than the receiving system can handle resulting in messages piling up and the queue depth alert firing. This is not a problem, and represents a false positive.
Similarly, the last sent alert may not be valid because there may be a lag in messages to send to this queue.
However the combination of queue depth > 0 and last sent hitting a threshold always represents problem. Meaning you have messages to send and the recieving system is not processing the queue.
Robert Milfajt
Northwestern Medicine
Chicago, IL
You will be able to “AND” slerts to create a single alert. Some alerts can be defined with no action at all just to be used as part of an “ANDed” alert
That has been requested
I would like to have better documentation of the alerts and maybe even some examples in documentation when completed. That alone would remove alot of the guess work I have had to do in the past.
My 2 cents,
Brian
There needs to be a way to Identify Holidays or light days of activity.
So you can schedule specfic days during the year to be exempt.
I think the idea will be to schedule when you want to run rather than when you don’t want to run. If you add too many bells and whistles it will take a year to laern to set it up.
Thast is why you get paid the big bucks 🙂 It certainly is within your expertise to change your scheduling this time of year. The scheduler already allows you to specify month, day of mont, day of week, etc.
– I’d really like to see more info for the operators (Jim asked for this)
– I think FTP processes need some special alert options
We have a separate, “home grown” app for FTP monitoring that
looks into the logs to determine if the file was processed, or if there
were logon errors, or other errors that prevented the transfer.
There has to be a better way, but perhaps an alert if the word
“normally” or “transferred” isn’t in the log.
As for notifying operators, that will be up to you. We provide the hooks, you provide the notification. There is no way we could ever code a “one size fits all” notification.
As for FTP alerts, you will able to provide alerts for those threads like any other. If you need more there will a specialized Tcl alert which will enable you to build any alert. Again, it would be hard to build a “one size fits all” canned alert.