Alerts Wish List

This topic has 35 replies, 15 voices, and was last updated 17 years, 2 months ago by Bill Marsch.

Creator

Topic
October 16, 2007 at 9:52 pm #49595
Charlie Bursell
Participant
OK guys and gals we are finally going to do something with the Alerts Engine 😀 As you are acutely aware, with the numbers of threads and sites that some of you are trying to monitor, it is all but impossible to do it visally.

I have been tasked to work with R&D to come up with some ideas of how to enhance the Cloverleaf alerts. I have some of my own ideas as those of you that have used my Alerts Package can attest to.

Now I would like your ideas. I can’t promise that we will incorporate all but I can promise that we will consider all.

Here is some of what I am looking at thus far:

Alerts trigger on transitions not absolutes. e.g. Trigger if thread transistions from UP to OPENING not just the fact that the thread is OPENING. It is normal when starting to go from DOWN to OPENING

Change disk values to absolutes instead of percentage. If you have used 80% of a Terra-Byte disk you still have a lot left. No need to define Cloverleaf file system in the alert. The software should be smart enough to figure that out 😉

Configure e-mail and other means of notification from Alert GUI

Separate the Alert and MonitorD daemon so one does not hose the other, You old timers remember it used to be that way

Alert on number of messages in Error Database or if MonitorD/Lock Manager is down

Easier to configure alerts

An Alerts log that will indicate which alerts have triggered, which have been fixed, which are still pending, fixed by who, etc.

I have more but this is all I will list for now. Let me hear from you. If you like you can respond here to let others know what you are thinking or send me e-mail: charlie.bursell@quovadx.com

I need your inputs within the next week or so.

Charlie
Creator

Topic

Viewing 34 reply threads

Author

Replies
- October 17, 2007 at 9:50 am #62650
  Glenn Friedenreich
  Participant
  Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).
  
  For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
  
  – Glenn
- October 17, 2007 at 11:05 am #62651
  Keith McLeod
  Participant
  Charlie,
  
  1) flowStatus alert. If an outbound queue has a queue depth, I want to know that the messages are flowing or not. I have this working to a certain extent with a modification to your protoStatus alert. It is not perfect, but is more on the side of it will tell you when messages are not flowing. The trouble I have is that I send these alerts to HP Openview and they want a correction alert. If the queue drains quickly, the correction alert will never be triggered. I also bounce the thread if I have alerted 5 times that messages are not flowing…. This is arbitrary and still in beta phase. I compare current time subtract last read time to a specified wait time. This allow me to individualize the wait time for those annoying threads that ACKs can take a long time to be returned by an application. Another scenario is not receiving ACKs from destination system by design(not mine). Trying to have this one corrected by vendor… I send the NOT FLOWING alert as often as it is triggered if it meets the criteria, Send the openview correction as often as triggered, but restrict email to 1 at the most so as to not bury email. This alert definitely helps.
  
  2) Can an alert fire off muliple times without resetting all alert counters? Say by touching default.alert.
  
  3) I think I have seen something in the works where you will alert on Process being down and skip the individual thread alerts…
  
  4) A mechanism to turn off all alerts or maybe some during maintenance.
  
  5) …
- October 17, 2007 at 12:04 pm #62652
  Bob Richardson
  Participant
  Greetings,
  
  Charlie, how about an alert to monitor the inbound queue depth?
  
  We have situations where messages get stuck on the inbound side due to our current design of (say) too many threads in a cluster with the xlate cmd thread overloaded (We now know that we can tweak the process to balance the percentages of time spent in the xlt cmd thread for a process in the NetConfig). This would include message states 1 thru 7. Also: as an adjunct to the alerts, repair the pxqd (pending) count logic in the msi: discovered that is broken when trying to alert on a connection via the msi and interpreting this count.
  
  Also: capability to do outbound queue depth alerts on https connections when using the UPOC protocol driver option.
  
  Thanks for spearheading this effort!
- October 17, 2007 at 12:31 pm #62653
  James Cobane
  Participant
  Glenn,
  
  This capability exists within the current Alert configurator; you just need to configure two alerts (one for last read >= 10 min, and another for last read >= 20 min, and set-up the respective schedule for each in the schedule window.
  
  Jim Cobane
  
  Henry Ford Health
  
  Glenn Wrote:
  
  Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).
  
  For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert. But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.
  
  – Glenn
- October 17, 2007 at 12:47 pm #62654
  James Cobane
  Participant
  Charlie,
  
  One thing I would like to see would be an option to configure additional text to display when the alert triggers; i.e. information that could be used to tell Operators what to do when that alert triggers, such as “Call the Vendor at (555)333-4455 and talk to Joe Schmoe”.
  
  Jim Cobane
  
  Henry Ford Health
- October 17, 2007 at 12:48 pm #62655
  Charlie Bursell
  Participant
  Glenn:
  
  As Jim says that capabilty does exist but, hopefully, we can make it easier
  
  Keith:
  
  1. I’m not sure I really understand point 1. Are you saying that you want to alert if the queue is draining but too slowly?
  
  2. This is a PRIMARY goal plus escalion say to some mgmt level
  
  3 and 4. This is already part of Alerts package and will certainly be part of my recommendation
  
  Bob:
  
  Not sure I understand. We already have a queue depth alert.
  
  Yes, I have recommend that we *NEVER reset send or reecive times, just the counts
  
  When using UPoC, part of the deal is to provide your own. It is a simple matter to set up an OB UPoC as both read and write. During read (timer) check how long it has been
  
  Good stuff guys, keep it coming
- October 17, 2007 at 1:21 pm #62656
  Michael Hertel
  Participant
  I’d like to see a native way of shutting down a thread, bouncing a thread (with a max number of bounces then shutdown), and maybe putting an outbound thread on hold. Many of us use a “file changed” alert that senses a “touched” file to execute a shutdown. One example is getting too many AR/AE responses.
- October 17, 2007 at 3:16 pm #62657
  John Hamilton
  Participant
  I want to add my two cents.
  
  Right now %A gives you the display text we need a way just to send the thread and process name. I have scripts where I parse them out of the display text being able to just pass the names would make life for those easier.
  
  The second would be a way to tell the alerts to keep triggering every x number of minutes until the condition is cleared then a way to send an all clear.
  
  The third based on the previous would be ways to say yes I know the remote system is down now leave me alone. A way too turn off alerts that we just configured to keep calling us until they are fixed.
- October 17, 2007 at 4:36 pm #62658
  Charlie Bursell
  Participant
  John:
  
  Thanks for the input
  
  1: ALerts will have a name you will be able to access with %N. That should help.
  
  2 and 3 are already on my list
  
  Don’t be too hard on Viken at the Level 3 class next week 😛
- October 18, 2007 at 2:13 am #62659
  Mark Thompson
  Participant
  Charlie,
  
  It would be really helpful if “last message received time” could persist through a thread bounce. Right now, if you bounce an inactive inbound thread to try to reestablish a connection, you lose the last receive time and get “never” instead.
  
  - Mark Thompson
  HealthPartners
- October 18, 2007 at 12:37 pm #62660
  Charlie Bursell
  Participant
  Mark:
  
  I have been trying to make that happen. I could not agree more
- October 18, 2007 at 3:38 pm #62661
  Mark Thompson
  Participant
  Charlie,
  
  Another nice to have alert is >N messages in the recovery or error database. We use cron jobs for this — it would be great to make it native.
  
  - Mark Thompson
  HealthPartners
- October 18, 2007 at 3:59 pm #62662
  Charlie Bursell
  Participant
  Mark: I have asked for this. The Alerts Package does this
- October 22, 2007 at 6:58 pm #62663
  Abe Rastkar
  Participant
  Regarding alerts. It would be good if the alert server could repeat the alert message per a given period if the condition of the alert is not changed. For example if a thread is in opening state, an alert is sent when it is detected. But only once. This proposed change would optionally allow the alert to go off periodically (per configuration) until it is fixed or cancelled.
- October 22, 2007 at 8:30 pm #62664
  Charlie Bursell
  Participant
  Abe:
  
  It is on the list and will be done
- October 23, 2007 at 11:25 am #62665
  Greg Eriksen
  Participant
  It would be nice if the amount of time between alert repeats was configurable per alert, and perhaps a cap on the total number of repeats while the condition remains true.
- October 23, 2007 at 12:10 pm #62666
  Pete Gilbert
  Participant
  I’d like to see more flexibility in the message that gets displayed when an alert triggers. Why am I limited to plain text in 2007? Let me give you a blob of html that gets rendered in a web browser, or allow me to specify a url for the message (i.e., a link to support information that is maintained elsewhere, for example in a wiki).
  
  I’d also like to be able to reference the site, process and thread names in the text of the message as variables (for example, @sitename, @processname or @threadname). I have to hard code these in the message now.
  
  We’d also like to see better integration with other monitoring tools. We use HP Openview here, and I have to call scripts with parameters to get Cloverleaf alerts to show up in Openview.
  
  Provide some ability to preview the alerts. As it stands now, I code it and don’t know that I got the syntax wrong until the operators tell me that the condition occurred, but they did not recieve an alert.
  
  Also, whatever you do should be documented, with good examples.
- October 23, 2007 at 1:11 pm #62667
  Pete Gilbert
  Participant
  oh….one more thing:
  
  The command entry for an alert should be more than a single line of text that allows me to see thirty characters. Being able to see only a portion of the command makes it difficult to create or debug.
  
  See the screenshot example.
- October 23, 2007 at 2:41 pm #62668
  Charlie Bursell
  Participant
  Pete:
  
  Thee will provisions for you to define your own A new alert called “Tcl” will be used and the message it returns will be the alert message. You will be able to logically AND or OR this with other alerts to create some pretty sophiticated alerts. You will alos be able to name alerts and you will be passed the name of the alert.
  
  As for more than 30 characters perhaps we need and edit option for exec like we have now for Tcl?
- October 24, 2007 at 2:00 pm #62669
  Bill Marsch
  Participant
  create an alert based on a dynamic set of characters, words, phrases or information found in the “view command and engine output” for a given process. This would alow one to intiate an alert by placing into a TCL the echo of a set of characters, words, phrases or information base on a condition or set of conditions in the TCL. The TCL could even be conditioned in a Xlate by the IF statement. This would allow a great user flexability and simplicity when testing or when processing in production.
- October 24, 2007 at 3:18 pm #62670
  Charlie Bursell
  Participant
  Bill:
  
  Not sure I understand all of that 😀 But you will able to define any alert via Tcl
  
  Charlie
- October 24, 2007 at 4:18 pm #62671
  Bill Marsch
  Participant
  Charlie,
  
  I did send you an email but thought maybe I could post here aws well as an attachment.
- October 24, 2007 at 5:48 pm #62672
  Robert Milfajt
  Participant
  I second for the flow status alert. Alone, the queue depth or last sent alerts may represent false positives.
  
  For example, sometimes we get a flood of messages from an inbound system, far more than the receiving system can handle resulting in messages piling up and the queue depth alert firing. This is not a problem, and represents a false positive.
  
  Similarly, the last sent alert may not be valid because there may be a lag in messages to send to this queue.
  
  However the combination of queue depth > 0 and last sent hitting a threshold always represents problem. Meaning you have messages to send and the recieving system is not processing the queue.
  
  Robert Milfajt
  Northwestern Medicine
  Chicago, IL
- October 24, 2007 at 6:25 pm #62673
  Charlie Bursell
  Participant
  Bob:
  
  You will be able to “AND” slerts to create a single alert. Some alerts can be defined with no action at all just to be used as part of an “ANDed” alert
- November 5, 2007 at 7:16 pm #62674
  Pete Gilbert
  Participant
  Another thing that would be nice is a simple interface to temporarily disable an alert that you know is going to fire during some maintenance procedure. Something along the lines of a check box to disable the alert. what we get told now is to create a copy of the alert file and delete the alerts that you don’t want to trigger. Then make that copied file the active alert file. Then you have to remember to move the real alert file back into place once the maintenance has been performed. That works, but it is clunkier than it should be.
- November 5, 2007 at 9:20 pm #62675
  Charlie Bursell
  Participant
  Pete:
  
  That has been requested
- November 30, 2007 at 1:29 pm #62676
  Brian Goad
  Participant
  Charlie,
  
  I would like to have better documentation of the alerts and maybe even some examples in documentation when completed. That alone would remove alot of the guess work I have had to do in the past.
  
  My 2 cents,
  
  Brian
- November 30, 2007 at 4:33 pm #62677
  John Hamilton
  Participant
  One more good one that was brought to light this last week.
  
  There needs to be a way to Identify Holidays or light days of activity.
  
  So you can schedule specfic days during the year to be exempt.
- November 30, 2007 at 6:31 pm #62678
  Charlie Bursell
  Participant
  John:
  
  I think the idea will be to schedule when you want to run rather than when you don’t want to run. If you add too many bells and whistles it will take a year to laern to set it up.
  
  Thast is why you get paid the big bucks 🙂 It certainly is within your expertise to change your scheduling this time of year. The scheduler already allows you to specify month, day of mont, day of week, etc.
- December 3, 2007 at 2:23 pm #62679
  John Hamilton
  Participant
  You did say wish list. I know it can be done. But remembering to do it is the hard part. You know us old dogs tend to …… what was I talking about?
- December 4, 2007 at 6:08 pm #62680
  Bill Marsch
  Participant
  Charlie, I know this was only placed out in early october “the wish list for alerts”, but I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come. I know it is that time of the year when everyone loves to give and receive.
- December 4, 2007 at 6:14 pm #62681
  Charlie Bursell
  Participant
  A little early bill. Thees are being compiled for the 5.7 release and we haven’t released 5.6 yet (End of this year)
- December 17, 2007 at 8:49 pm #62682
  Tom Patton
  Participant
  .. then maybe I can add a couple of thoughts:
  
  – I’d really like to see more info for the operators (Jim asked for this)
  
  – I think FTP processes need some special alert options
  
  We have a separate, “home grown” app for FTP monitoring that
  
  looks into the logs to determine if the file was processed, or if there
  
  were logon errors, or other errors that prevented the transfer.
  
  There has to be a better way, but perhaps an alert if the word
  
  “normally” or “transferred” isn’t in the log.
- December 17, 2007 at 9:32 pm #62683
  Charlie Bursell
  Participant
  Tom:
  
  As for notifying operators, that will be up to you. We provide the hooks, you provide the notification. There is no way we could ever code a “one size fits all” notification.
  
  As for FTP alerts, you will able to provide alerts for those threads like any other. If you need more there will a specialized Tcl alert which will enable you to build any alert. Again, it would be hard to build a “one size fits all” canned alert.
- June 11, 2008 at 2:24 pm #62684
  Bill Marsch
  Participant
  Charlie, I know I asked this before though that was about a while ago. There appears to have been no postings since december 2007. I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come and when. We have just recently implemented in our production area some alerts. We are runing under 5.6 of cloverleaf. Thanks, bill
Author

Replies