Disabling alerts for planned downtimes

Clovertech Forums Cloverleaf Disabling alerts for planned downtimes

  • Creator
    Topic
  • #120171
    Ken Lambert
    Participant

      I’m just wondering what other people do to disable alerts during any planned downtimes. We did OS security patches last night and got several hundred alerts during that process. Can you simply backup the default.alrt file for each site and copy over a blank one?

    Viewing 4 reply threads
    • Author
      Replies
      • #120172
        James Cobane
        Participant

          Hi Ken,

          We have a ‘minimal.alrt’ configuration that we invoke for planned downtimes that has just some basic system level alert types (i.e. user%, disk%, et al).  For Cloverleaf server downtimes, we backup the current default.alrt and then copy ‘minimal.alrt’ to default.alrt so that when the system comes back up it invokes the “minimal” version (as the monitor daemon invokes the default.alrt when it starts).  For our Epic downtimes, we simply invoke the minimal.alrt via the command line for each site:

          hcisitectl -r m -d 1 -A “m=-cl minimal.alrt”

          Since the majority of our connections are between Epic and other external systems, we invoke this during the Epic downtimes to minimize false positives.

          Our “minimal.alrt” also contains an alert that sends an e-mail to us if it has been “in play” for more than 4 hours, just in case we forgot to re-invoke the original default.alrt.

          Hope this helps.

          Jim Cobane

          Henry Ford Health

        • #120174
          Ken Lambert
          Participant

            Jim,

            That is extremely helpful. Thank you very much!

            Ken Lambert

            • #120182
              Jay Hammond
              Participant

                Jim,

                I’m interested in the alert you have set up for the minimal.alrt.  How is that set up? Are you using a file change alert?  If so, what criteria do you have in the alert?

                I have a script that we can run with options to either enable or disable all alerts in all sites.  When disable is chosen, it renamed the default.alrt and creates a new default.alrt file with nothing in it except a prologue.  If enable is chosen it renames the default.alrt (which is the deactivated one) with a timestamp in the name and renames the real file back to default.alrt.

                I’d love to be able to alert our team if we don’t re-enable the default.alrt as you mentioned.

                If anyone wants to see my chicken scratch script, I’ll be happy to share.

            • #120179
              Paul Stein
              Participant

                We have a lot of sites, so I use a tcl script to iterate all sites and ‘disable’ each alert that is part of an alert group named ‘maint’ for maintenance. Alert groups can be defined for each alert. After the downtime, I then run the script again, except I change the line word disable to enable in the last if statement.

                I actually refactored this from a script that Jim has shared before on this forum, so many thank Jim. Goes to show how valuable this forum is.

                I plan to make this executable and able to be scheduled from CRON or advanced scheduler thread when we upgrade to 20.1.

                 

                For now however, I open a hcitcl prompt and execute this:

                #########################################

                global hciRoot
                set hciRoot $env(HCIROOT)
                #set hciRoot $hcr

                set siteList “”

                # Add code here to search for siteInfo, then for each site

                set fileList [glob -nocomplain ${hciRoot}/**/siteInfo]

                foreach fileName $fileList {
                set lindexvar [expr [llength [file split $fileName]] – 2]
                set siteName [lindex [file split $fileName] $lindexvar]
                lappend siteList $siteName
                }
                foreach site $siteList {
                if {$site ne {siteProto}} {
                echo $site
                if {[catch {exec hcicmd -s $site -t d -c {disable group maint}} output]} {
                echo “ERROR:$output”
                }
                }
                }

                 

              • #120190
                Ken Lambert
                Participant

                  Just to contribute some additional code to the cause here is what I use in shell scripts to get our list of sites from server.ini instead of looking for siteInfo files:

                  mySites=grep environs /cloverleaf/cis19.1/integrator/server/server.ini | awk 'BEGIN{FS="/";RS=";";ORS=" "} { print $NF }'

                • #120191
                  James Cobane
                  Participant

                    @Jay Hammond – The condition we have in the minimal.alrt to send an e-mail when it has been active for more than 4 hours is simply the ‘system % CPU >= 0’ (since that condition will always be true) for a duration of 240 minutes.  So, after 240 minutes, the alert triggers.

                    Hope that makes sense.

                    Jim Cobane

                    Henry Ford Health

                    • #120194
                      Jay Hammond
                      Participant

                        I’ll be.  It’s so simple.  Thank you very much!

                  Viewing 4 reply threads
                  • You must be logged in to reply to this topic.