Disabling alerts for planned downtimes

Homepage Clovertech Forums Cloverleaf Disabling alerts for planned downtimes

  • Creator
    Topic
  • #120171
    Ken Lambert
    Participant

    I’m just wondering what other people do to disable alerts during any planned downtimes. We did OS security patches last night and got several hundred alerts during that process. Can you simply backup the default.alrt file for each site and copy over a blank one?

Viewing 4 reply threads
  • Author
    Replies
    • #120172
      James Cobane
      Participant

      Hi Ken,

      We have a ‘minimal.alrt’ configuration that we invoke for planned downtimes that has just some basic system level alert types (i.e. user%, disk%, et al).  For Cloverleaf server downtimes, we backup the current default.alrt and then copy ‘minimal.alrt’ to default.alrt so that when the system comes back up it invokes the “minimal” version (as the monitor daemon invokes the default.alrt when it starts).  For our Epic downtimes, we simply invoke the minimal.alrt via the command line for each site:

      hcisitectl -r m -d 1 -A “m=-cl minimal.alrt”

      Since the majority of our connections are between Epic and other external systems, we invoke this during the Epic downtimes to minimize false positives.

      Our “minimal.alrt” also contains an alert that sends an e-mail to us if it has been “in play” for more than 4 hours, just in case we forgot to re-invoke the original default.alrt.

      Hope this helps.

      Jim Cobane

      Henry Ford Health

    • #120174
      Ken Lambert
      Participant

      Jim,

      That is extremely helpful. Thank you very much!

      Ken Lambert

      • #120182
        Jay Hammond
        Participant

        Jim,

        I’m interested in the alert you have set up for the minimal.alrt.  How is that set up? Are you using a file change alert?  If so, what criteria do you have in the alert?

        I have a script that we can run with options to either enable or disable all alerts in all sites.  When disable is chosen, it renamed the default.alrt and creates a new default.alrt file with nothing in it except a prologue.  If enable is chosen it renames the default.alrt (which is the deactivated one) with a timestamp in the name and renames the real file back to default.alrt.

        I’d love to be able to alert our team if we don’t re-enable the default.alrt as you mentioned.

        If anyone wants to see my chicken scratch script, I’ll be happy to share.

    • #120179
      Paul Stein
      Participant

      We have a lot of sites, so I use a tcl script to iterate all sites and ‘disable’ each alert that is part of an alert group named ‘maint’ for maintenance. Alert groups can be defined for each alert. After the downtime, I then run the script again, except I change the line word disable to enable in the last if statement.

      I actually refactored this from a script that Jim has shared before on this forum, so many thank Jim. Goes to show how valuable this forum is.

      I plan to make this executable and able to be scheduled from CRON or advanced scheduler thread when we upgrade to 20.1.

       

      For now however, I open a hcitcl prompt and execute this:

      #########################################

      global hciRoot
      set hciRoot $env(HCIROOT)
      #set hciRoot $hcr

      set siteList “”

      # Add code here to search for siteInfo, then for each site

      set fileList [glob -nocomplain ${hciRoot}/**/siteInfo]

      foreach fileName $fileList {
      set lindexvar [expr [llength [file split $fileName]] – 2]
      set siteName [lindex [file split $fileName] $lindexvar]
      lappend siteList $siteName
      }
      foreach site $siteList {
      if {$site ne {siteProto}} {
      echo $site
      if {[catch {exec hcicmd -s $site -t d -c {disable group maint}} output]} {
      echo “ERROR:$output”
      }
      }
      }

       

    • #120190
      Ken Lambert
      Participant

      Just to contribute some additional code to the cause here is what I use in shell scripts to get our list of sites from server.ini instead of looking for siteInfo files:

      mySites=grep environs /cloverleaf/cis19.1/integrator/server/server.ini | awk 'BEGIN{FS="/";RS=";";ORS=" "} { print $NF }'

    • #120191
      James Cobane
      Participant

      @Jay Hammond – The condition we have in the minimal.alrt to send an e-mail when it has been active for more than 4 hours is simply the ‘system % CPU >= 0’ (since that condition will always be true) for a duration of 240 minutes.  So, after 240 minutes, the alert triggers.

      Hope that makes sense.

      Jim Cobane

      Henry Ford Health

      • This reply was modified 1 year, 8 months ago by James Cobane.
      • #120194
        Jay Hammond
        Participant

        I’ll be.  It’s so simple.  Thank you very much!

Viewing 4 reply threads
  • You must be logged in to reply to this topic.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,292
Replies
34,432
Topic Tags
286
Empty Topic Tags
10