Homepage › Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Alerts Wish List
- This topic has 35 replies, 15 voices, and was last updated 16 years, 4 months ago by Bill Marsch.
-
CreatorTopic
-
October 16, 2007 at 9:52 pm #49595Charlie BursellParticipant
OK guys and gals we are finally going to do something with the Alerts Engine 😀 As you are acutely aware, with the numbers of threads and sites that some of you are trying to monitor, it is all but impossible to do it visally.I have been tasked to work with R&D to come up with some ideas of how to enhance the Cloverleaf alerts. I have some of my own ideas as those of you that have used my Alerts Package can attest to.
Now I would like your ideas. I can’t promise that we will incorporate all but I can promise that we will consider all.
Here is some of what I am looking at thus far:
Alerts trigger on transitions not absolutes. e.g. Trigger if thread transistions from UP to OPENING not just the fact that the thread is OPENING. It is normal when starting to go from DOWN to OPENING
Change disk values to absolutes instead of percentage. If you have used 80% of a Terra-Byte disk you still have a lot left. No need to define Cloverleaf file system in the alert. The software should be smart enough to figure that out
😉 Configure e-mail and other means of notification from Alert GUI
Separate the Alert and MonitorD daemon so one does not hose the other, You old timers remember it used to be that way
Alert on number of messages in Error Database or if MonitorD/Lock Manager is down
Easier to configure alerts
An Alerts log that will indicate which alerts have triggered, which have been fixed, which are still pending, fixed by who, etc.
I have more but this is all I will list for now. Let me hear from you. If you like you can respond here to let others know what you are thinking or send me e-mail:
charlie.bursell@quovadx.com I need your inputs within the next week or so.
Charlie
-
CreatorTopic
-
AuthorReplies
-
-
October 17, 2007 at 9:50 am #62650Glenn FriedenreichParticipant
Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table). For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
– Glenn
-
October 17, 2007 at 11:05 am #62651Keith McLeodParticipant
Charlie, 1) flowStatus alert. If an outbound queue has a queue depth, I want to know that the messages are flowing or not. I have this working to a certain extent with a modification to your protoStatus alert. It is not perfect, but is more on the side of it will tell you when messages are not flowing. The trouble I have is that I send these alerts to HP Openview and they want a correction alert. If the queue drains quickly, the correction alert will never be triggered. I also bounce the thread if I have alerted 5 times that messages are not flowing…. This is arbitrary and still in beta phase. I compare current time subtract last read time to a specified wait time. This allow me to individualize the wait time for those annoying threads that ACKs can take a long time to be returned by an application. Another scenario is not receiving ACKs from destination system by design(not mine). Trying to have this one corrected by vendor… I send the NOT FLOWING alert as often as it is triggered if it meets the criteria, Send the openview correction as often as triggered, but restrict email to 1 at the most so as to not bury email. This alert definitely helps.
2) Can an alert fire off muliple times without resetting all alert counters? Say by touching default.alert.
3) I think I have seen something in the works where you will alert on Process being down and skip the individual thread alerts…
4) A mechanism to turn off all alerts or maybe some during maintenance.
5) …
-
October 17, 2007 at 12:04 pm #62652Bob RichardsonParticipant
Greetings, Charlie, how about an alert to monitor the inbound queue depth?
We have situations where messages get stuck on the inbound side due to our current design of (say) too many threads in a cluster with the xlate cmd thread overloaded (We now know that we can tweak the process to balance the percentages of time spent in the xlt cmd thread for a process in the NetConfig). This would include message states 1 thru 7. Also: as an adjunct to the alerts, repair the pxqd (pending) count logic in the msi: discovered that is broken when trying to alert on a connection via the msi and interpreting this count.
Also: capability to do outbound queue depth alerts on https connections when using the UPOC protocol driver option.
Thanks for spearheading this effort!
-
October 17, 2007 at 12:31 pm #62653James CobaneParticipant
Glenn, This capability exists within the current Alert configurator; you just need to configure two alerts (one for last read >= 10 min, and another for last read >= 20 min, and set-up the respective schedule for each in the schedule window.
Jim Cobane
Henry Ford Health
Glenn Wrote:
Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).
For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
– Glenn
-
October 17, 2007 at 12:47 pm #62654James CobaneParticipant
Charlie, One thing I would like to see would be an option to configure additional text to display when the alert triggers; i.e. information that could be used to tell Operators what to do when that alert triggers, such as “Call the Vendor at (555)333-4455 and talk to Joe Schmoe”.
Jim Cobane
Henry Ford Health
-
October 17, 2007 at 12:48 pm #62655Charlie BursellParticipant
Glenn: As Jim says that capabilty does exist but, hopefully, we can make it easier
Keith:
1. I’m not sure I really understand point 1. Are you saying that you want to alert if the queue is draining but too slowly?
2. This is a PRIMARY goal plus escalion say to some mgmt level
3 and 4. This is already part of Alerts package and will certainly be part of my recommendation
Bob:
Not sure I understand. We already have a queue depth alert.
Yes, I have recommend that we *NEVER reset send or reecive times, just the counts
When using UPoC, part of the deal is to provide your own. It is a simple matter to set up an OB UPoC as both read and write. During read (timer) check how long it has been
Good stuff guys, keep it coming
-
October 17, 2007 at 1:21 pm #62656Michael HertelParticipant
I’d like to see a native way of shutting down a thread, bouncing a thread (with a max number of bounces then shutdown), and maybe putting an outbound thread on hold. Many of us use a “file changed” alert that senses a “touched” file to execute a shutdown. One example is getting too many AR/AE responses. -
October 17, 2007 at 3:16 pm #62657John HamiltonParticipant
I want to add my two cents. Right now %A gives you the display text we need a way just to send the thread and process name. I have scripts where I parse them out of the display text being able to just pass the names would make life for those easier.
The second would be a way to tell the alerts to keep triggering every x number of minutes until the condition is cleared then a way to send an all clear.
The third based on the previous would be ways to say yes I know the remote system is down now leave me alone. A way too turn off alerts that we just configured to keep calling us until they are fixed.
-
October 17, 2007 at 4:36 pm #62658Charlie BursellParticipant
John: Thanks for the input
1: ALerts will have a name you will be able to access with %N. That should help.
2 and 3 are already on my list
Don’t be too hard on Viken at the Level 3 class next week 😛
-
October 18, 2007 at 2:13 am #62659Mark ThompsonParticipant
Charlie, It would be really helpful if “last message received time” could persist through a thread bounce. Right now, if you bounce an inactive inbound thread to try to reestablish a connection, you lose the last receive time and get “never” instead.
- Mark Thompson
HealthPartners -
October 18, 2007 at 12:37 pm #62660Charlie BursellParticipant
Mark: I have been trying to make that happen. I could not agree more
-
October 18, 2007 at 3:38 pm #62661Mark ThompsonParticipant
Charlie, Another nice to have alert is >N messages in the recovery or error database. We use cron jobs for this — it would be great to make it native.
- Mark Thompson
HealthPartners -
October 18, 2007 at 3:59 pm #62662Charlie BursellParticipant
Mark: I have asked for this. The Alerts Package does this -
October 22, 2007 at 6:58 pm #62663Abe RastkarParticipant
Regarding alerts. It would be good if the alert server could repeat the alert message per a given period if the condition of the alert is not changed. For example if a thread is in opening state, an alert is sent when it is detected. But only once. This proposed change would optionally allow the alert to go off periodically (per configuration) until it is fixed or cancelled. -
October 22, 2007 at 8:30 pm #62664Charlie BursellParticipant
Abe: It is on the list and will be done
-
October 23, 2007 at 11:25 am #62665Greg EriksenParticipant
It would be nice if the amount of time between alert repeats was configurable per alert, and perhaps a cap on the total number of repeats while the condition remains true. -
October 23, 2007 at 12:10 pm #62666Pete GilbertParticipant
I’d like to see more flexibility in the message that gets displayed when an alert triggers. Why am I limited to plain text in 2007? Let me give you a blob of html that gets rendered in a web browser, or allow me to specify a url for the message (i.e., a link to support information that is maintained elsewhere, for example in a wiki). I’d also like to be able to reference the site, process and thread names in the text of the message as variables (for example, @sitename, @processname or @threadname). I have to hard code these in the message now.
We’d also like to see better integration with other monitoring tools. We use HP Openview here, and I have to call scripts with parameters to get Cloverleaf alerts to show up in Openview.
Provide some ability to preview the alerts. As it stands now, I code it and don’t know that I got the syntax wrong until the operators tell me that the condition occurred, but they did not recieve an alert.
Also, whatever you do should be documented, with good examples.
-
October 23, 2007 at 1:11 pm #62667Pete GilbertParticipant
oh….one more thing: The command entry for an alert should be more than a single line of text that allows me to see thirty characters. Being able to see only a portion of the command makes it difficult to create or debug.
See the screenshot example.
-
October 23, 2007 at 2:41 pm #62668Charlie BursellParticipant
Pete: Thee will provisions for you to define your own A new alert called “Tcl” will be used and the message it returns will be the alert message. You will be able to logically AND or OR this with other alerts to create some pretty sophiticated alerts. You will alos be able to name alerts and you will be passed the name of the alert.
As for more than 30 characters perhaps we need and edit option for exec like we have now for Tcl?
-
October 24, 2007 at 2:00 pm #62669Bill MarschParticipant
create an alert based on a dynamic set of characters, words, phrases or information found in the “view command and engine output” for a given process. This would alow one to intiate an alert by placing into a TCL the echo of a set of characters, words, phrases or information base on a condition or set of conditions in the TCL. The TCL could even be conditioned in a Xlate by the IF statement. This would allow a great user flexability and simplicity when testing or when processing in production. -
October 24, 2007 at 3:18 pm #62670Charlie BursellParticipant
Bill: Not sure I understand all of that 😀 But you will able to define any alert via Tcl
Charlie
-
October 24, 2007 at 4:18 pm #62671Bill MarschParticipant
Charlie, I did send you an email but thought maybe I could post here aws well as an attachment.
-
October 24, 2007 at 5:48 pm #62672Robert MilfajtParticipant
I second for the flow status alert. Alone, the queue depth or last sent alerts may represent false positives. For example, sometimes we get a flood of messages from an inbound system, far more than the receiving system can handle resulting in messages piling up and the queue depth alert firing. This is not a problem, and represents a false positive.
Similarly, the last sent alert may not be valid because there may be a lag in messages to send to this queue.
However the combination of queue depth > 0 and last sent hitting a threshold always represents problem. Meaning you have messages to send and the recieving system is not processing the queue.
Robert Milfajt
Northwestern Medicine
Chicago, IL -
October 24, 2007 at 6:25 pm #62673Charlie BursellParticipant
Bob: You will be able to “AND” slerts to create a single alert. Some alerts can be defined with no action at all just to be used as part of an “ANDed” alert
-
November 5, 2007 at 7:16 pm #62674Pete GilbertParticipant
Another thing that would be nice is a simple interface to temporarily disable an alert that you know is going to fire during some maintenance procedure. Something along the lines of a check box to disable the alert. what we get told now is to create a copy of the alert file and delete the alerts that you don’t want to trigger. Then make that copied file the active alert file. Then you have to remember to move the real alert file back into place once the maintenance has been performed. That works, but it is clunkier than it should be. -
November 5, 2007 at 9:20 pm #62675Charlie BursellParticipant
Pete: That has been requested
-
November 30, 2007 at 1:29 pm #62676Brian GoadParticipant
Charlie, I would like to have better documentation of the alerts and maybe even some examples in documentation when completed. That alone would remove alot of the guess work I have had to do in the past.
My 2 cents,
Brian
-
November 30, 2007 at 4:33 pm #62677John HamiltonParticipant
One more good one that was brought to light this last week. There needs to be a way to Identify Holidays or light days of activity.
So you can schedule specfic days during the year to be exempt.
-
November 30, 2007 at 6:31 pm #62678Charlie BursellParticipant
John: I think the idea will be to schedule when you want to run rather than when you don’t want to run. If you add too many bells and whistles it will take a year to laern to set it up.
Thast is why you get paid the big bucks 🙂 It certainly is within your expertise to change your scheduling this time of year. The scheduler already allows you to specify month, day of mont, day of week, etc.
-
December 3, 2007 at 2:23 pm #62679John HamiltonParticipant
You did say wish list. I know it can be done. But remembering to do it is the hard part. You know us old dogs tend to …… what was I talking about? -
December 4, 2007 at 6:08 pm #62680Bill MarschParticipant
Charlie, I know this was only placed out in early october “the wish list for alerts”, but I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come. I know it is that time of the year when everyone loves to give and receive. -
December 4, 2007 at 6:14 pm #62681Charlie BursellParticipant
A little early bill. Thees are being compiled for the 5.7 release and we haven’t released 5.6 yet (End of this year) -
December 17, 2007 at 8:49 pm #62682Tom PattonParticipant
.. then maybe I can add a couple of thoughts: – I’d really like to see more info for the operators (Jim asked for this)
– I think FTP processes need some special alert options
We have a separate, “home grown” app for FTP monitoring that
looks into the logs to determine if the file was processed, or if there
were logon errors, or other errors that prevented the transfer.
There has to be a better way, but perhaps an alert if the word
“normally” or “transferred” isn’t in the log.
-
December 17, 2007 at 9:32 pm #62683Charlie BursellParticipant
Tom: As for notifying operators, that will be up to you. We provide the hooks, you provide the notification. There is no way we could ever code a “one size fits all” notification.
As for FTP alerts, you will able to provide alerts for those threads like any other. If you need more there will a specialized Tcl alert which will enable you to build any alert. Again, it would be hard to build a “one size fits all” canned alert.
-
June 11, 2008 at 2:24 pm #62684Bill MarschParticipant
Charlie, I know I asked this before though that was about a while ago. There appears to have been no postings since december 2007. I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come and when. We have just recently implemented in our production area some alerts. We are runing under 5.6 of cloverleaf. Thanks, bill
-
-
AuthorReplies
- The forum ‘Cloverleaf’ is closed to new topics and replies.