Alerts Wish List

  • Creator
    Topic
  • #49595
    Charlie Bursell
    Participant

    OK guys and gals we are finally going to do something with the Alerts Engine 😀   As you are acutely aware, with the numbers of threads and sites that some of you are trying to monitor, it is all but impossible to do it visally.

    I have been tasked to work with R&D to come up with some ideas of how to enhance the Cloverleaf alerts.  I have some of my own ideas as those of you that have used my Alerts Package can attest to.  

    Now I would like your ideas.  I can’t promise that we will incorporate all but I can promise that we will consider all.

    Here is some of what I am looking at thus far:

    Alerts trigger on transitions not absolutes.  e.g. Trigger if thread transistions from UP to OPENING not just the fact that the thread is OPENING.  It is normal when starting to go from DOWN to OPENING

    Change disk values to absolutes instead of percentage.  If you have used 80% of a Terra-Byte disk you still have a lot left.  No need to define Cloverleaf file system in the alert.  The software should be smart enough to figure that out  ðŸ˜‰

    Configure e-mail and other means of notification from Alert GUI

    Separate the Alert and MonitorD daemon so one does not hose the other,  You old timers remember it used to be that way

    Alert on number of messages in Error Database or if MonitorD/Lock Manager is down

    Easier to configure alerts

    An Alerts log that will indicate which alerts have triggered, which have been fixed, which are still pending, fixed by who, etc.

    I have more but this is all I will list for now.  Let me hear from you.  If you like you can respond here to let others know what you are thinking or send me e-mail:  charlie.bursell@quovadx.com

    I need your inputs within the next week or so.

    Charlie

Viewing 34 reply threads
  • Author
    Replies
    • #62650
      Glenn Friedenreich
      Participant

      Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).  

      For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert.  But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.

      – Glenn

    • #62651
      Keith McLeod
      Participant

      Charlie,

      1) flowStatus alert.  If an outbound queue has a queue depth, I want to know that the messages are flowing or not.  I have this working to a certain extent with a modification to your protoStatus alert.  It is not perfect, but is more on the side of it will tell you when messages are not flowing.  The trouble I have is that I send these alerts to HP Openview and they want a correction alert.  If the queue drains quickly, the correction alert will never be triggered.  I also bounce the thread if I have alerted 5 times that messages are not flowing…. This is arbitrary and still in beta phase.  I compare current time subtract last read time to a specified wait time.  This allow me to individualize the wait time for those annoying threads that ACKs can take a long time to be returned by an application.  Another scenario is not receiving ACKs from destination system by design(not mine).  Trying to have this one corrected by vendor… I send the NOT FLOWING alert as often as it is triggered if it meets the criteria, Send the openview correction as often as triggered, but restrict email to 1 at the most so as to not bury email. This alert definitely helps.

      2) Can an alert fire off muliple times without resetting all alert counters? Say by touching default.alert.

      3) I think I have seen something in the works where you will alert on Process being down and skip the individual thread alerts…

      4) A mechanism to turn off all alerts or maybe some during maintenance.

      5) …

    • #62652
      Bob Richardson
      Participant

      Greetings,

      Charlie, how about an alert to monitor the inbound queue depth?

      We have situations where messages get stuck on the inbound side due to our current design of (say) too many threads in a cluster with the xlate cmd thread overloaded (We now know that we can tweak the process to balance the percentages of time spent in the xlt cmd thread for a process in the NetConfig).  This would include message states 1 thru 7.  Also: as an adjunct to the alerts, repair the pxqd (pending) count logic in the msi: discovered that is broken when trying to alert on a connection via the msi and interpreting this count.

      Also: capability to do outbound queue depth alerts on https connections when using the UPOC protocol driver option.

      Thanks for spearheading this effort!

    • #62653
      James Cobane
      Participant

      Glenn,

      This capability exists within the current Alert configurator; you just need to configure two alerts (one for last read >= 10 min, and another for last read >= 20 min, and set-up the respective schedule for each in the schedule window.

      Jim Cobane

      Henry Ford Health


      Glenn Wrote:

      Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).  

      For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert.  But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.

      – Glenn

    • #62654
      James Cobane
      Participant

      Charlie,

      One thing I would like to see would be an option to configure additional text to display when the alert triggers; i.e. information that could be used to tell Operators what to do when that alert triggers, such as “Call the Vendor at (555)333-4455 and talk to Joe Schmoe”.

      Jim Cobane

      Henry Ford Health

    • #62655
      Charlie Bursell
      Participant

      Glenn:  

      As Jim says that capabilty does exist but, hopefully, we can make it easier

      Keith:  

      1. I’m not sure I really understand point 1.  Are you saying that you want to alert if the queue is draining but too slowly?

      2. This is a PRIMARY goal plus escalion say to some mgmt level

      3 and 4.  This is already part of Alerts package and will certainly be part of my recommendation

      Bob:

      Not sure I understand.  We already have a queue depth alert.

      Yes, I have recommend that we *NEVER reset send or reecive times, just the counts

      When using UPoC, part of the deal is to provide your own.  It is a simple matter to set up an OB UPoC as both read and write.  During read (timer) check how long it has been

      Good stuff guys, keep it coming

    • #62656
      Michael Hertel
      Participant

      I’d like to see a native way of shutting down a thread, bouncing a thread (with a max number of bounces then shutdown), and maybe putting an outbound thread on hold. Many of us use a “file changed” alert that senses a “touched” file to execute a shutdown. One example is getting too many AR/AE responses.

    • #62657
      John Hamilton
      Participant

      I want to add my two cents.

      Right now %A gives you the display text we need a way just to send the thread and process name.  I have scripts where I parse them out of the display text being able to just pass the names would make life for those easier.

      The second would be a way to tell the alerts to keep triggering every x number of minutes until the condition is cleared then a way to send an all clear.

      The third based on the previous would be ways to say yes I know the remote system is down now leave me alone. A way too turn off alerts that we just configured to keep calling us until they are fixed.

    • #62658
      Charlie Bursell
      Participant

      John:

      Thanks for the input

      1: ALerts will have a name you will be able to access with %N.  That should help.

      2 and 3 are already on my list

      Don’t be too hard on Viken at the Level 3 class next week  ðŸ˜›

    • #62659
      Mark Thompson
      Participant

      Charlie,

      It would be really helpful if “last message received time” could persist through a thread bounce.  Right now, if you bounce an inactive inbound thread to try to reestablish a connection, you lose the last receive time and get “never” instead.

      - Mark Thompson
      HealthPartners

    • #62660
      Charlie Bursell
      Participant

      Mark:

      I have been trying to make that happen.  I could not agree more

    • #62661
      Mark Thompson
      Participant

      Charlie,

      Another nice to have alert is >N messages in the recovery or error database.  We use cron jobs for this — it would be great to make it native.

      - Mark Thompson
      HealthPartners

    • #62662
      Charlie Bursell
      Participant

      Mark:  I have asked for this.  The Alerts Package does this

    • #62663
      Abe Rastkar
      Participant

      Regarding alerts. It would be good if the alert server could repeat the alert message per a given period if the condition of the alert is not changed. For example if a thread is in opening state, an alert is sent when it is detected. But only once. This proposed change would optionally allow the alert to go off periodically (per configuration) until it is fixed or cancelled.

    • #62664
      Charlie Bursell
      Participant

      Abe:

      It is on the list and will be done

    • #62665
      Greg Eriksen
      Participant

      It would be nice if the amount of time between alert repeats was configurable per alert, and perhaps a cap on the total number of repeats while the condition remains true.

    • #62666
      Pete Gilbert
      Participant

      I’d like to see more flexibility in the message that gets displayed when an alert triggers. Why am I limited to plain text in 2007? Let me give you a blob of html that gets rendered in a web browser, or allow me to specify a url for the message (i.e., a link to support information that is maintained elsewhere, for example in a wiki).

      I’d also like to be able to reference the site, process and thread names in the text of the message as variables (for example, @sitename, @processname or @threadname). I have to hard code these in the message now.

      We’d also like to see better integration with other monitoring tools. We use HP Openview here, and I have to call scripts with parameters to get Cloverleaf alerts to show up in Openview.

      Provide some ability to preview the alerts. As it stands now, I code it and don’t know that I got the syntax wrong until the operators tell me that the condition occurred, but they did not recieve an alert.

      Also, whatever you do should be documented, with good examples.

    • #62667
      Pete Gilbert
      Participant

      oh….one more thing:

      The command entry for an alert should be more than a single line of text that allows me to see thirty characters. Being able to see only a portion of the command makes it difficult to create or debug.

      See the screenshot example.

    • #62668
      Charlie Bursell
      Participant

      Pete:

      Thee will provisions for you to define your own   A new alert called “Tcl” will be used and the message it returns will be the alert message.  You will be able to logically AND or OR this with other alerts to create some pretty sophiticated alerts.  You will alos be able to name alerts and you will be passed the name of the alert.

      As for more than 30 characters perhaps we need and edit option for exec like we have now for Tcl?

    • #62669
      Bill Marsch
      Participant

      create an alert based on a dynamic set of characters, words, phrases or information found in the “view command and engine output” for a given process.  This would alow one to intiate an alert by placing into a TCL the echo of a set of characters, words, phrases or information base on a condition or set of conditions in the TCL.  The TCL could even be conditioned in a Xlate by the IF statement.  This would allow a great user flexability and simplicity when testing or when processing in production.

    • #62670
      Charlie Bursell
      Participant

      Bill:

      Not sure I understand all of that  ðŸ˜€   But you will able to define any alert via Tcl

      Charlie

    • #62671
      Bill Marsch
      Participant

      Charlie,

      I did send you an email but thought maybe I could post here aws well as an attachment.

    • #62672
      Robert Milfajt
      Participant

      I second for the flow status alert.  Alone, the queue depth or last sent alerts may represent false positives.

      For example, sometimes we get a flood of messages from an inbound system, far more than the receiving system can handle resulting in messages piling up and the queue depth alert firing.  This is not a problem, and represents a false positive.

      Similarly, the last sent alert may not be valid because there may be a lag in messages to send to this queue.

      However the combination of queue depth > 0 and last sent hitting a threshold always represents problem.  Meaning you have messages to send and the recieving system is not processing the queue.

      Robert Milfajt
      Northwestern Medicine
      Chicago, IL

    • #62673
      Charlie Bursell
      Participant

      Bob:

      You will be able to “AND” slerts to create a single alert.  Some alerts can be defined with no action at all just to be used as part of an “ANDed” alert

    • #62674
      Pete Gilbert
      Participant

      Another thing that would be nice is a simple interface to temporarily disable an alert that you know is going to fire during some maintenance procedure. Something along the lines of a check box to disable the alert. what we get told now is to create a copy of the alert file and delete the alerts that you don’t want to trigger. Then make that copied file the active alert file. Then you have to remember to move the real alert file back into place once the maintenance has been performed. That works, but it is clunkier than it should be.

    • #62675
      Charlie Bursell
      Participant

      Pete:

      That has been requested

    • #62676
      Brian Goad
      Participant

      Charlie,

      I would like to have better documentation of the alerts and maybe even some examples in documentation when completed. That alone would remove alot of the guess work I have had to do in the past.

      My 2 cents,

      Brian

    • #62677
      John Hamilton
      Participant

      One more good one that was brought to light this last week.

      There needs to be a way to Identify Holidays or light days of activity.

      So you can schedule specfic days during the year to be exempt.

    • #62678
      Charlie Bursell
      Participant

      John:

      I think the idea will be to schedule when you want to run rather than when you don’t want to run.  If you add too many bells and whistles it will take a year to laern to set it up.

      Thast is why you get paid the big bucks  ðŸ™‚   It certainly is within your expertise to change your scheduling this time of year.  The scheduler already allows you to specify month, day of mont, day of week, etc.

    • #62679
      John Hamilton
      Participant

      You did say wish list. I know it can be done. But remembering to do it is the hard part.  You know us old dogs tend to …… what was I talking about?

    • #62680
      Bill Marsch
      Participant

      Charlie, I know this was only placed out in early october “the wish list for alerts”, but I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come.  I know it is that time of the year when everyone loves to give and receive.

    • #62681
      Charlie Bursell
      Participant

      A little early bill.  Thees are being compiled for the 5.7 release and we haven’t released 5.6 yet (End of this year)

    • #62682
      Tom Patton
      Participant

      .. then maybe I can add a couple of thoughts:

      – I’d really like to see more info for the operators (Jim asked for this)

      – I think FTP processes need some special alert options

       We have a separate, “home grown” app for FTP monitoring that

       looks into the logs to determine if the file was processed, or if there

       were logon errors, or other errors that prevented the transfer.

       There has to be a better way, but perhaps an alert if the word

       “normally” or “transferred” isn’t in the log.

    • #62683
      Charlie Bursell
      Participant

      Tom:

      As for notifying operators, that will be up to you.  We provide the hooks, you provide the notification.  There is no way we could ever code a “one size fits all” notification.

      As for FTP alerts, you will able to provide alerts for those threads like any other.  If you need more there will a specialized Tcl alert which will enable you to build any alert.  Again, it would be hard to build a “one size fits all” canned alert.

    • #62684
      Bill Marsch
      Participant

      Charlie, I know I asked this before though that was about a while ago.  There appears to have been no postings since december 2007.  I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come and when.  We have just recently implemented in our production area some alerts.  We are runing under 5.6 of cloverleaf.  Thanks, bill

Viewing 34 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,105
Forums
28
Topics
9,278
Replies
34,382
Topic Tags
281