Verifing Alerts

  • Creator
    Topic
  • #51184
    Tim Jipson
    Participant

    How can tell if my alerts are being monitored without causing an error condition?

    Thanks,

    Tim J.

Viewing 4 reply threads
  • Author
    Replies
    • #69087
      Russ Ross
      Participant

      Under AIX we are able to list the processes and determine if alerts are running for a given site.

      I have written some utility scripts to help me with doing that as follows:

      – list_sites.ksh (lists all my cloverleaf sites)

      – turn_on_alerts.ksh (loads monitord default.alrt file)

      – turn_off_alerts.ksh (loads monitord off.alrt file)

      – ps_md.ksh (lists all the monitord process that can be viewed or grepped for “off” or site anme)

      Here is my ps_md.ksh I wrote:

      Code:

      #!/usr/bin/ksh

      site_list=`list_sites.ksh`

      for site in `list_sites.ksh`; do
         if [ -a $HCIROOT/$site/exec/hcimonitord/pid ]; then
             monitord_pid=`cat $HCIROOT/$site/exec/hcimonitord/pid`
             cmd_args=`ps -p $monitord_pid -o “%a” | tail -1 | awk ‘{print $3}’`
             echo “n Site ( $site ) hcimonitord pid ( $monitord_pid ) command/args ( $cmd_args )”
         fi
      done

      echo “”

      Here is a sample of what is displayed to the xterm display:

      NOTE – I turned off the alerts to the first site listed so you can see how the off.alrt argument shows up

      Code:


      Site ( p_bldbnk ) hcimonitord pid ( 590000 ) command/args ( off.alrt )

      Site ( p_cs_allergy ) hcimonitord pid ( 872638 ) command/args ( default.alrt )

      Site ( p_flat_adt_mock ) hcimonitord pid ( 1114344 ) command/args ( default.alrt )

      Site ( p_g2_adt_out ) hcimonitord pid ( 770074 ) command/args ( default.alrt )

      Site ( p_g2_adt_out_mock ) hcimonitord pid ( 1503290 ) command/args ( off.alrt )

      Site ( p_g_adt_out ) hcimonitord pid ( 671828 ) command/args ( default.alrt )

      Site ( p_g_adt_out_mock ) hcimonitord pid ( 1511572 ) command/args ( default.alrt )

      Site ( p_golive_bedtrk ) hcimonitord pid ( 831678 ) command/args ( default.alrt )

      Site ( p_golive_maxsys ) hcimonitord pid ( 1020144 ) command/args ( default.alrt )

      Site ( p_lis_mock ) hcimonitord pid ( 483436 ) command/args ( default.alrt )

      Site ( p_sched_out ) hcimonitord pid ( 1343616 ) command/args ( default.alrt )

      Site ( p_sched_out2 ) hcimonitord pid ( 782350 ) command/args ( default.alrt )

      Site ( p_sched_out_mock ) hcimonitord pid ( 1663160 ) command/args ( default.alrt )

      Site ( p_sms_23_adt_mock ) hcimonitord pid ( 598138 ) command/args ( default.alrt )

      Site ( prod_cbord ) hcimonitord pid ( 1646624 ) command/args ( default.alrt )

      Site ( prod_cbord_mock ) hcimonitord pid ( 1638618 ) command/args ( default.alrt )

      Site ( prod_emr ) hcimonitord pid ( 762106 ) command/args ( default.alrt )

      Site ( prod_flat_adt ) hcimonitord pid ( 974980 ) command/args ( default.alrt )

      Site ( prod_genie ) hcimonitord pid ( 905256 ) command/args ( default.alrt )

      Site ( prod_global_adt ) hcimonitord pid ( 876720 ) command/args ( default.alrt )

      Site ( prod_lis ) hcimonitord pid ( 401442 ) command/args ( default.alrt )

      Site ( prod_mymda ) hcimonitord pid ( 491752 ) command/args ( default.alrt )

      Site ( prod_pharm ) hcimonitord pid ( 1384496 ) command/args ( default.alrt )

      Site ( prod_sms_21_adt ) hcimonitord pid ( 1232976 ) command/args ( default.alrt )

      Site ( prod_sms_22_adt ) hcimonitord pid ( 475176 ) command/args ( default.alrt )

      Site ( prod_sms_23_adt ) hcimonitord pid ( 901146 ) command/args ( default.alrt )

      Site ( prod_sms_order ) hcimonitord pid ( 1556656 ) command/args ( default.alrt )

      Site ( prod_sms_sched ) hcimonitord pid ( 1454188 ) command/args ( default.alrt )

      Site ( prod_super_adt ) hcimonitord pid ( 413880 ) command/args ( default.alrt )

      Here is my turn_on_alerts.ksh script I wrote:

      Code:

      #!/usr/bin/ksh

      kill_hcinetmonitor.ksh

      hcisitectl -k m -s m -A “a=-cl ‘default.alrt'”
      echo “”
      ps -ef | head -1
      ps -ef | grep hcimonitord | grep -v grep

      Here is my turn_off_alerts.ksh script I wrote:

      Code:

      #!/usr/bin/ksh

      kill_hcinetmonitor.ksh

      if [[ ! -f $HCISITEDIR/Alerts/off.alrt ]]; then
         touch $HCISITEDIR/Alerts/off.alrt
      fi

      hcisitectl -k m -s m -A “a=-cl ‘off.alrt'”
      echo “”
      ps -ef | head -1
      ps -ef | grep hcimonitord | grep -v grep

      Here is my list_sites.ksh script I wrote:

      Code:

      #!/usr/bin/ksh

      # Begin Module Header ==============================================================================
      #
      #——
      # Name:
      #——
      #
      # list_sites.ksh
      #
      #———
      # Purpose:
      #———
      #
      # List all the sites for the current $HCIROOT
      #
      #——–
      # Inputs:
      #——–
      #
      # none
      #
      #——-
      # Notes:
      #——-
      #
      # This script assumes the proper environment is set when it is called
      # and all sites are symbolic links at the $HCIROOT level
      #
      #———
      # History:
      #———
      #
      # 2001.02.22 Russ Ross
      #          – wrote initial version.
      #
      # 2001.07.25 Russ Ross
      #          – modified to look for $HCIROOT/*/siteInfo files to determine the list of sites,
      #            previously had looked for symbolic links in the $HCIROOT directory which only
      #            works if everyone follows MDACC conventions
      #
      # 2003.11.17 Russ Ross
      #          – modified to execlude siteProto from the list of sites
      #
      # End of Module Header =============================================================================

      #—————————————————————————————–
      # get a list of all the sites for the current $HCIROOT
      # (Note:  there is an assumption that all sites are a symbolic link at the $HCIROOT level)
      #—————————————————————————————–

      (cd $HCIROOT; ls ./*/siteInfo 2>/dev/null) | awk -F/ ‘{print $2}’ | grep -v siteProto | sort

      Another usefull but simple script I wrote is called psgrep.ksh as seen here:

      Code:

      #!/usr/bin/ksh

      echo “”
      ps -ef | head -1
      ps -ef | grep $1 | grep -v grep
      echo “”

      Which allows me to also do the following

      psgrep.ksh hcimonitord

      to get a quick overview of what site alerts are turned off or on as seen in the sample output below:

      Code:

          hci  401442       1   0 07:06:10      –  0:02 hcimonitord -cl default.alrt -S prod_lis
          hci  413880       1   0 07:18:11      –  0:00 hcimonitord -cl default.alrt -S prod_super_adt
          hci  475176       1   0 07:13:11      –  0:00 hcimonitord -cl default.alrt -S prod_sms_22_adt
          hci  483436       1   0   Aug 31      – 25:21 hcimonitord -cl default.alrt -S p_lis_mock
          hci  491752       1   0 07:09:11      –  0:01 hcimonitord -cl default.alrt -S prod_mymda
          hci  590020       1   0 07:26:10      –  0:00 hcimonitord -cl default.alrt -S p_bldbnk
          hci  598138       1   1   Aug 31      – 25:30 hcimonitord -cl default.alrt -S p_sms_23_adt_mock
          hci  671828       1   0 07:10:10      –  0:00 hcimonitord -cl default.alrt -S p_g_adt_out
          hci  762106       1   0 07:02:11      –  0:01 hcimonitord -cl default.alrt -S prod_emr
          hci  770074       1   0 07:19:10      –  0:00 hcimonitord -cl default.alrt -S p_g2_adt_out
          hci  782350       1   0   Aug 31      – 20:04 hcimonitord -cl default.alrt -S p_sched_out2
          hci  831678       1   1   Aug 31      – 20:41 hcimonitord -cl default.alrt -S p_golive_bedtrk
          hci  872638       1   0 07:05:10      –  0:01 hcimonitord -cl default.alrt -S p_cs_allergy
          hci  876720       1   0 07:04:11      –  0:01 hcimonitord -cl default.alrt -S prod_global_adt
          hci  901146       1   0 07:14:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_23_adt
          hci  905256       1   0 07:07:11      –  0:01 hcimonitord -cl default.alrt -S prod_genie
          hci  974980       1   0 07:03:10      –  0:01 hcimonitord -cl default.alrt -S prod_flat_adt
          hci 1020144       1   0   Aug 31      – 23:16 hcimonitord -cl default.alrt -S p_golive_maxsys
          hci 1114344       1   0   Aug 31      – 20:11 hcimonitord -cl default.alrt -S p_flat_adt_mock
          hci 1232976       1   0 07:12:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_21_adt
          hci 1343616       1   0 07:17:10      –  0:00 hcimonitord -cl default.alrt -S p_sched_out
          hci 1384496       1   0 07:08:10      –  0:01 hcimonitord -cl default.alrt -S prod_pharm
          hci 1454188       1   1 07:16:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_sched
          hci 1503290       1   0   Sep 08      – 11:08 hcimonitord -cl off.alrt -S p_g2_adt_out_mock
          hci 1511572       1   0   Aug 31      – 23:38 hcimonitord -cl default.alrt -S p_g_adt_out_mock
          hci 1556656       1   0 07:15:11      –  0:00 hcimonitord -cl default.alrt -S prod_sms_order
          hci 1638618       1   0   Aug 31      – 20:38 hcimonitord -cl default.alrt -S prod_cbord_mock
          hci 1646624       1   0 07:00:11      –  0:01 hcimonitord -cl default.alrt -S prod_cbord
          hci 1663160       1   0   Aug 31      – 23:13 hcimonitord -cl default.alrt -S p_sched_out_mock

      I see the p_g2_adt_out_mock site has the alerts turned off at this time.

      Russ Ross
      RussRoss318@gmail.com

    • #69088
      Tim Jipson
      Participant

      That is extremely helpful, thank you!!!

    • #69089
      Bob Richardson
      Participant

      Greetings Russ and company:

      I tried your illustrated command “ps -p monitord_pid -o %a” to get the args and no luck.  We are running AIX5.3 TL8 SP7 and CIS5.6.2.

      For example, executing >ps -p 307570 -o %a

                         yields: COMMAND

                                    hcimonitord -S allina_prod3

      Maybe an AIX version “feature”?  (we recently went from SP3 to SP7 to migrate to IBM supported version of our disk hardware).

      Just for thought here.

      BobR

    • #69090
      Russ Ross
      Participant

      What you described I hope isn’t a SP7 issue.

      What you described will also happen when you start the monitord by launching the netmonitor from the IDE instead of doing something like

      hcisitectl -k m -s m -A “a=-cl ‘default.alrt'”

      By default when starting the monitord by launching the netmonitor from the IDE loads the default.alrt file even though it doesn’t show up in the args, which I’m not crazy about either but learned to live with it.

      When I don’t see any args I’ve been able to assume that the default.alrt file is loaded.

      I like to see the args so when this comes to my attention I simply run the turn_on_alerts.ksh script I posted.

      I’m still able to

      ps_md | grep off

      to see if any sites have their alerts turned off despite this inconsistancy.

      Another thing to note is interfaces in a site will remain running even if the monitord is killed without stopping the interfaces, but I don’t believe the alerts will continue to run.

      You could check for this condition if you wanted by looking for any process pid files and not finding

      $HCISITEDIR/exec/hcimonitord/pid

      Russ Ross
      RussRoss318@gmail.com

    • #69091
      Bob Richardson
      Participant

      Russ:

      You must be correct in your assumption:  our scripts stop/start the monitor and lock managers using the vanilla command syntax

      >hcisitectl -K; hcisitectl -S

      I am aware of the fuller syntax but our shop has not gone that far

      at this point.

      Just a caveat that unless other folks use that full syntax to start up their monitor daemons, it appears the “%a” option will not yield what alrt file was loaded.

      See you at the conference?

Viewing 4 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,292
Replies
34,432
Topic Tags
286
Empty Topic Tags
10