Alerts Wish List

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Alerts Wish List

  • Creator
    Topic
  • #49595
    Charlie Bursell
    Participant

      OK guys and gals we are finally going to do something with the Alerts Engine 😀   As you are acutely aware, with the numbers of threads and sites that some of you are trying to monitor, it is all but impossible to do it visally.

      I have been tasked to work with R&D to come up with some ideas of how to enhance the Cloverleaf alerts.  I have some of my own ideas as those of you that have used my Alerts Package can attest to.  

      Now I would like your ideas.  I can’t promise that we will incorporate all but I can promise that we will consider all.

      Here is some of what I am looking at thus far:

      Alerts trigger on transitions not absolutes.  e.g. Trigger if thread transistions from UP to OPENING not just the fact that the thread is OPENING.  It is normal when starting to go from DOWN to OPENING

      Change disk values to absolutes instead of percentage.  If you have used 80% of a Terra-Byte disk you still have a lot left.  No need to define Cloverleaf file system in the alert.  The software should be smart enough to figure that out  ðŸ˜‰

      Configure e-mail and other means of notification from Alert GUI

      Separate the Alert and MonitorD daemon so one does not hose the other,  You old timers remember it used to be that way

      Alert on number of messages in Error Database or if MonitorD/Lock Manager is down

      Easier to configure alerts

      An Alerts log that will indicate which alerts have triggered, which have been fixed, which are still pending, fixed by who, etc.

      I have more but this is all I will list for now.  Let me hear from you.  If you like you can respond here to let others know what you are thinking or send me e-mail:  charlie.bursell@quovadx.com

      I need your inputs within the next week or so.

      Charlie

    Viewing 34 reply threads
    • Author
      Replies
      • #62650
        Glenn Friedenreich
        Participant

          Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).  

          For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert.  But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.

          – Glenn

        • #62651
          Keith McLeod
          Participant

            Charlie,

            1) flowStatus alert.  If an outbound queue has a queue depth, I want to know that the messages are flowing or not.  I have this working to a certain extent with a modification to your protoStatus alert.  It is not perfect, but is more on the side of it will tell you when messages are not flowing.  The trouble I have is that I send these alerts to HP Openview and they want a correction alert.  If the queue drains quickly, the correction alert will never be triggered.  I also bounce the thread if I have alerted 5 times that messages are not flowing…. This is arbitrary and still in beta phase.  I compare current time subtract last read time to a specified wait time.  This allow me to individualize the wait time for those annoying threads that ACKs can take a long time to be returned by an application.  Another scenario is not receiving ACKs from destination system by design(not mine).  Trying to have this one corrected by vendor… I send the NOT FLOWING alert as often as it is triggered if it meets the criteria, Send the openview correction as often as triggered, but restrict email to 1 at the most so as to not bury email. This alert definitely helps.

            2) Can an alert fire off muliple times without resetting all alert counters? Say by touching default.alert.

            3) I think I have seen something in the works where you will alert on Process being down and skip the individual thread alerts…

            4) A mechanism to turn off all alerts or maybe some during maintenance.

            5) …

          • #62652
            Bob Richardson
            Participant

              Greetings,

              Charlie, how about an alert to monitor the inbound queue depth?

              We have situations where messages get stuck on the inbound side due to our current design of (say) too many threads in a cluster with the xlate cmd thread overloaded (We now know that we can tweak the process to balance the percentages of time spent in the xlt cmd thread for a process in the NetConfig).  This would include message states 1 thru 7.  Also: as an adjunct to the alerts, repair the pxqd (pending) count logic in the msi: discovered that is broken when trying to alert on a connection via the msi and interpreting this count.

              Also: capability to do outbound queue depth alerts on https connections when using the UPOC protocol driver option.

              Thanks for spearheading this effort!

            • #62653
              James Cobane
              Participant

                Glenn,

                This capability exists within the current Alert configurator; you just need to configure two alerts (one for last read >= 10 min, and another for last read >= 20 min, and set-up the respective schedule for each in the schedule window.

                Jim Cobane

                Henry Ford Health


                Glenn Wrote:

                Hi Charlie – One capability that we’d like to see is alerting on inbound thread inactivity, configurable by inactivity duration, day of week and time range (possibly configurable in a table).  

                For example, during weekday hours, 10 minutes of inactivity on our inbound ADT thread would be abnormal and should trigger an alert.  But if it’s between midnight and 04:00 Sunday, we’d like the alert to trigger after 20 minutes of inactivity on the inbound thread.

                – Glenn

              • #62654
                James Cobane
                Participant

                  Charlie,

                  One thing I would like to see would be an option to configure additional text to display when the alert triggers; i.e. information that could be used to tell Operators what to do when that alert triggers, such as “Call the Vendor at (555)333-4455 and talk to Joe Schmoe”.

                  Jim Cobane

                  Henry Ford Health

                • #62655
                  Charlie Bursell
                  Participant

                    Glenn:  

                    As Jim says that capabilty does exist but, hopefully, we can make it easier

                    Keith:  

                    1. I’m not sure I really understand point 1.  Are you saying that you want to alert if the queue is draining but too slowly?

                    2. This is a PRIMARY goal plus escalion say to some mgmt level

                    3 and 4.  This is already part of Alerts package and will certainly be part of my recommendation

                    Bob:

                    Not sure I understand.  We already have a queue depth alert.

                    Yes, I have recommend that we *NEVER reset send or reecive times, just the counts

                    When using UPoC, part of the deal is to provide your own.  It is a simple matter to set up an OB UPoC as both read and write.  During read (timer) check how long it has been

                    Good stuff guys, keep it coming

                  • #62656
                    Michael Hertel
                    Participant

                      I’d like to see a native way of shutting down a thread, bouncing a thread (with a max number of bounces then shutdown), and maybe putting an outbound thread on hold. Many of us use a “file changed” alert that senses a “touched” file to execute a shutdown. One example is getting too many AR/AE responses.

                    • #62657
                      John Hamilton
                      Participant

                        I want to add my two cents.

                        Right now %A gives you the display text we need a way just to send the thread and process name.  I have scripts where I parse them out of the display text being able to just pass the names would make life for those easier.

                        The second would be a way to tell the alerts to keep triggering every x number of minutes until the condition is cleared then a way to send an all clear.

                        The third based on the previous would be ways to say yes I know the remote system is down now leave me alone. A way too turn off alerts that we just configured to keep calling us until they are fixed.

                      • #62658
                        Charlie Bursell
                        Participant

                          John:

                          Thanks for the input

                          1: ALerts will have a name you will be able to access with %N.  That should help.

                          2 and 3 are already on my list

                          Don’t be too hard on Viken at the Level 3 class next week  ðŸ˜›

                        • #62659
                          Mark Thompson
                          Participant

                            Charlie,

                            It would be really helpful if “last message received time” could persist through a thread bounce.  Right now, if you bounce an inactive inbound thread to try to reestablish a connection, you lose the last receive time and get “never” instead.

                            - Mark Thompson
                            HealthPartners

                          • #62660
                            Charlie Bursell
                            Participant

                              Mark:

                              I have been trying to make that happen.  I could not agree more

                            • #62661
                              Mark Thompson
                              Participant

                                Charlie,

                                Another nice to have alert is >N messages in the recovery or error database.  We use cron jobs for this — it would be great to make it native.

                                - Mark Thompson
                                HealthPartners

                              • #62662
                                Charlie Bursell
                                Participant

                                  Mark:  I have asked for this.  The Alerts Package does this

                                • #62663
                                  Abe Rastkar
                                  Participant

                                    Regarding alerts. It would be good if the alert server could repeat the alert message per a given period if the condition of the alert is not changed. For example if a thread is in opening state, an alert is sent when it is detected. But only once. This proposed change would optionally allow the alert to go off periodically (per configuration) until it is fixed or cancelled.

                                  • #62664
                                    Charlie Bursell
                                    Participant

                                      Abe:

                                      It is on the list and will be done

                                    • #62665
                                      Greg Eriksen
                                      Participant

                                        It would be nice if the amount of time between alert repeats was configurable per alert, and perhaps a cap on the total number of repeats while the condition remains true.

                                      • #62666
                                        Pete Gilbert
                                        Participant

                                          I’d like to see more flexibility in the message that gets displayed when an alert triggers. Why am I limited to plain text in 2007? Let me give you a blob of html that gets rendered in a web browser, or allow me to specify a url for the message (i.e., a link to support information that is maintained elsewhere, for example in a wiki).

                                          I’d also like to be able to reference the site, process and thread names in the text of the message as variables (for example, @sitename, @processname or @threadname). I have to hard code these in the message now.

                                          We’d also like to see better integration with other monitoring tools. We use HP Openview here, and I have to call scripts with parameters to get Cloverleaf alerts to show up in Openview.

                                          Provide some ability to preview the alerts. As it stands now, I code it and don’t know that I got the syntax wrong until the operators tell me that the condition occurred, but they did not recieve an alert.

                                          Also, whatever you do should be documented, with good examples.

                                        • #62667
                                          Pete Gilbert
                                          Participant

                                            oh….one more thing:

                                            The command entry for an alert should be more than a single line of text that allows me to see thirty characters. Being able to see only a portion of the command makes it difficult to create or debug.

                                            See the screenshot example.

                                          • #62668
                                            Charlie Bursell
                                            Participant

                                              Pete:

                                              Thee will provisions for you to define your own   A new alert called “Tcl” will be used and the message it returns will be the alert message.  You will be able to logically AND or OR this with other alerts to create some pretty sophiticated alerts.  You will alos be able to name alerts and you will be passed the name of the alert.

                                              As for more than 30 characters perhaps we need and edit option for exec like we have now for Tcl?

                                            • #62669
                                              Bill Marsch
                                              Participant

                                                create an alert based on a dynamic set of characters, words, phrases or information found in the “view command and engine output” for a given process.  This would alow one to intiate an alert by placing into a TCL the echo of a set of characters, words, phrases or information base on a condition or set of conditions in the TCL.  The TCL could even be conditioned in a Xlate by the IF statement.  This would allow a great user flexability and simplicity when testing or when processing in production.

                                              • #62670
                                                Charlie Bursell
                                                Participant

                                                  Bill:

                                                  Not sure I understand all of that  ðŸ˜€   But you will able to define any alert via Tcl

                                                  Charlie

                                                • #62671
                                                  Bill Marsch
                                                  Participant

                                                    Charlie,

                                                    I did send you an email but thought maybe I could post here aws well as an attachment.

                                                  • #62672
                                                    Robert Milfajt
                                                    Participant

                                                      I second for the flow status alert.  Alone, the queue depth or last sent alerts may represent false positives.

                                                      For example, sometimes we get a flood of messages from an inbound system, far more than the receiving system can handle resulting in messages piling up and the queue depth alert firing.  This is not a problem, and represents a false positive.

                                                      Similarly, the last sent alert may not be valid because there may be a lag in messages to send to this queue.

                                                      However the combination of queue depth > 0 and last sent hitting a threshold always represents problem.  Meaning you have messages to send and the recieving system is not processing the queue.

                                                      Robert Milfajt
                                                      Northwestern Medicine
                                                      Chicago, IL

                                                    • #62673
                                                      Charlie Bursell
                                                      Participant

                                                        Bob:

                                                        You will be able to “AND” slerts to create a single alert.  Some alerts can be defined with no action at all just to be used as part of an “ANDed” alert

                                                      • #62674
                                                        Pete Gilbert
                                                        Participant

                                                          Another thing that would be nice is a simple interface to temporarily disable an alert that you know is going to fire during some maintenance procedure. Something along the lines of a check box to disable the alert. what we get told now is to create a copy of the alert file and delete the alerts that you don’t want to trigger. Then make that copied file the active alert file. Then you have to remember to move the real alert file back into place once the maintenance has been performed. That works, but it is clunkier than it should be.

                                                        • #62675
                                                          Charlie Bursell
                                                          Participant

                                                            Pete:

                                                            That has been requested

                                                          • #62676
                                                            Brian Goad
                                                            Participant

                                                              Charlie,

                                                              I would like to have better documentation of the alerts and maybe even some examples in documentation when completed. That alone would remove alot of the guess work I have had to do in the past.

                                                              My 2 cents,

                                                              Brian

                                                            • #62677
                                                              John Hamilton
                                                              Participant

                                                                One more good one that was brought to light this last week.

                                                                There needs to be a way to Identify Holidays or light days of activity.

                                                                So you can schedule specfic days during the year to be exempt.

                                                              • #62678
                                                                Charlie Bursell
                                                                Participant

                                                                  John:

                                                                  I think the idea will be to schedule when you want to run rather than when you don’t want to run.  If you add too many bells and whistles it will take a year to laern to set it up.

                                                                  Thast is why you get paid the big bucks  ðŸ™‚   It certainly is within your expertise to change your scheduling this time of year.  The scheduler already allows you to specify month, day of mont, day of week, etc.

                                                                • #62679
                                                                  John Hamilton
                                                                  Participant

                                                                    You did say wish list. I know it can be done. But remembering to do it is the hard part.  You know us old dogs tend to …… what was I talking about?

                                                                  • #62680
                                                                    Bill Marsch
                                                                    Participant

                                                                      Charlie, I know this was only placed out in early october “the wish list for alerts”, but I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come.  I know it is that time of the year when everyone loves to give and receive.

                                                                    • #62681
                                                                      Charlie Bursell
                                                                      Participant

                                                                        A little early bill.  Thees are being compiled for the 5.7 release and we haven’t released 5.6 yet (End of this year)

                                                                      • #62682
                                                                        Tom Patton
                                                                        Participant

                                                                          .. then maybe I can add a couple of thoughts:

                                                                          – I’d really like to see more info for the operators (Jim asked for this)

                                                                          – I think FTP processes need some special alert options

                                                                           We have a separate, “home grown” app for FTP monitoring that

                                                                           looks into the logs to determine if the file was processed, or if there

                                                                           were logon errors, or other errors that prevented the transfer.

                                                                           There has to be a better way, but perhaps an alert if the word

                                                                           “normally” or “transferred” isn’t in the log.

                                                                        • #62683
                                                                          Charlie Bursell
                                                                          Participant

                                                                            Tom:

                                                                            As for notifying operators, that will be up to you.  We provide the hooks, you provide the notification.  There is no way we could ever code a “one size fits all” notification.

                                                                            As for FTP alerts, you will able to provide alerts for those threads like any other.  If you need more there will a specialized Tcl alert which will enable you to build any alert.  Again, it would be hard to build a “one size fits all” canned alert.

                                                                          • #62684
                                                                            Bill Marsch
                                                                            Participant

                                                                              Charlie, I know I asked this before though that was about a while ago.  There appears to have been no postings since december 2007.  I was wondering if was posible for a status of the wish list and modifications that we might expect or options/documentation/summary of what is to come and when.  We have just recently implemented in our production area some alerts.  We are runing under 5.6 of cloverleaf.  Thanks, bill

                                                                          Viewing 34 reply threads
                                                                          • The forum ‘Cloverleaf’ is closed to new topics and replies.