Verifing Alerts

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Verifing Alerts

  • Creator
    Topic
  • #51184
    Tim Jipson
    Participant

      How can tell if my alerts are being monitored without causing an error condition?

      Thanks,

      Tim J.

    Viewing 4 reply threads
    • Author
      Replies
      • #69087
        Russ Ross
        Participant

          Under AIX we are able to list the processes and determine if alerts are running for a given site.

          I have written some utility scripts to help me with doing that as follows:

          – list_sites.ksh (lists all my cloverleaf sites)

          – turn_on_alerts.ksh (loads monitord default.alrt file)

          – turn_off_alerts.ksh (loads monitord off.alrt file)

          – ps_md.ksh (lists all the monitord process that can be viewed or grepped for “off” or site anme)

          Here is my ps_md.ksh I wrote:

          Code:

          #!/usr/bin/ksh

          site_list=`list_sites.ksh`

          for site in `list_sites.ksh`; do
             if [ -a $HCIROOT/$site/exec/hcimonitord/pid ]; then
                 monitord_pid=`cat $HCIROOT/$site/exec/hcimonitord/pid`
                 cmd_args=`ps -p $monitord_pid -o “%a” | tail -1 | awk ‘{print $3}’`
                 echo “n Site ( $site ) hcimonitord pid ( $monitord_pid ) command/args ( $cmd_args )”
             fi
          done

          echo “”

          Here is a sample of what is displayed to the xterm display:

          NOTE – I turned off the alerts to the first site listed so you can see how the off.alrt argument shows up

          Code:


          Site ( p_bldbnk ) hcimonitord pid ( 590000 ) command/args ( off.alrt )

          Site ( p_cs_allergy ) hcimonitord pid ( 872638 ) command/args ( default.alrt )

          Site ( p_flat_adt_mock ) hcimonitord pid ( 1114344 ) command/args ( default.alrt )

          Site ( p_g2_adt_out ) hcimonitord pid ( 770074 ) command/args ( default.alrt )

          Site ( p_g2_adt_out_mock ) hcimonitord pid ( 1503290 ) command/args ( off.alrt )

          Site ( p_g_adt_out ) hcimonitord pid ( 671828 ) command/args ( default.alrt )

          Site ( p_g_adt_out_mock ) hcimonitord pid ( 1511572 ) command/args ( default.alrt )

          Site ( p_golive_bedtrk ) hcimonitord pid ( 831678 ) command/args ( default.alrt )

          Site ( p_golive_maxsys ) hcimonitord pid ( 1020144 ) command/args ( default.alrt )

          Site ( p_lis_mock ) hcimonitord pid ( 483436 ) command/args ( default.alrt )

          Site ( p_sched_out ) hcimonitord pid ( 1343616 ) command/args ( default.alrt )

          Site ( p_sched_out2 ) hcimonitord pid ( 782350 ) command/args ( default.alrt )

          Site ( p_sched_out_mock ) hcimonitord pid ( 1663160 ) command/args ( default.alrt )

          Site ( p_sms_23_adt_mock ) hcimonitord pid ( 598138 ) command/args ( default.alrt )

          Site ( prod_cbord ) hcimonitord pid ( 1646624 ) command/args ( default.alrt )

          Site ( prod_cbord_mock ) hcimonitord pid ( 1638618 ) command/args ( default.alrt )

          Site ( prod_emr ) hcimonitord pid ( 762106 ) command/args ( default.alrt )

          Site ( prod_flat_adt ) hcimonitord pid ( 974980 ) command/args ( default.alrt )

          Site ( prod_genie ) hcimonitord pid ( 905256 ) command/args ( default.alrt )

          Site ( prod_global_adt ) hcimonitord pid ( 876720 ) command/args ( default.alrt )

          Site ( prod_lis ) hcimonitord pid ( 401442 ) command/args ( default.alrt )

          Site ( prod_mymda ) hcimonitord pid ( 491752 ) command/args ( default.alrt )

          Site ( prod_pharm ) hcimonitord pid ( 1384496 ) command/args ( default.alrt )

          Site ( prod_sms_21_adt ) hcimonitord pid ( 1232976 ) command/args ( default.alrt )

          Site ( prod_sms_22_adt ) hcimonitord pid ( 475176 ) command/args ( default.alrt )

          Site ( prod_sms_23_adt ) hcimonitord pid ( 901146 ) command/args ( default.alrt )

          Site ( prod_sms_order ) hcimonitord pid ( 1556656 ) command/args ( default.alrt )

          Site ( prod_sms_sched ) hcimonitord pid ( 1454188 ) command/args ( default.alrt )

          Site ( prod_super_adt ) hcimonitord pid ( 413880 ) command/args ( default.alrt )

          Here is my turn_on_alerts.ksh script I wrote:

          Code:

          #!/usr/bin/ksh

          kill_hcinetmonitor.ksh

          hcisitectl -k m -s m -A “a=-cl ‘default.alrt'”
          echo “”
          ps -ef | head -1
          ps -ef | grep hcimonitord | grep -v grep

          Here is my turn_off_alerts.ksh script I wrote:

          Code:

          #!/usr/bin/ksh

          kill_hcinetmonitor.ksh

          if [[ ! -f $HCISITEDIR/Alerts/off.alrt ]]; then
             touch $HCISITEDIR/Alerts/off.alrt
          fi

          hcisitectl -k m -s m -A “a=-cl ‘off.alrt'”
          echo “”
          ps -ef | head -1
          ps -ef | grep hcimonitord | grep -v grep

          Here is my list_sites.ksh script I wrote:

          Code:

          #!/usr/bin/ksh

          # Begin Module Header ==============================================================================
          #
          #——
          # Name:
          #——
          #
          # list_sites.ksh
          #
          #———
          # Purpose:
          #———
          #
          # List all the sites for the current $HCIROOT
          #
          #——–
          # Inputs:
          #——–
          #
          # none
          #
          #——-
          # Notes:
          #——-
          #
          # This script assumes the proper environment is set when it is called
          # and all sites are symbolic links at the $HCIROOT level
          #
          #———
          # History:
          #———
          #
          # 2001.02.22 Russ Ross
          #          – wrote initial version.
          #
          # 2001.07.25 Russ Ross
          #          – modified to look for $HCIROOT/*/siteInfo files to determine the list of sites,
          #            previously had looked for symbolic links in the $HCIROOT directory which only
          #            works if everyone follows MDACC conventions
          #
          # 2003.11.17 Russ Ross
          #          – modified to execlude siteProto from the list of sites
          #
          # End of Module Header =============================================================================

          #—————————————————————————————–
          # get a list of all the sites for the current $HCIROOT
          # (Note:  there is an assumption that all sites are a symbolic link at the $HCIROOT level)
          #—————————————————————————————–

          (cd $HCIROOT; ls ./*/siteInfo 2>/dev/null) | awk -F/ ‘{print $2}’ | grep -v siteProto | sort

          Another usefull but simple script I wrote is called psgrep.ksh as seen here:

          Code:

          #!/usr/bin/ksh

          echo “”
          ps -ef | head -1
          ps -ef | grep $1 | grep -v grep
          echo “”

          Which allows me to also do the following

          psgrep.ksh hcimonitord

          to get a quick overview of what site alerts are turned off or on as seen in the sample output below:

          Code:

              hci  401442       1   0 07:06:10      –  0:02 hcimonitord -cl default.alrt -S prod_lis
              hci  413880       1   0 07:18:11      –  0:00 hcimonitord -cl default.alrt -S prod_super_adt
              hci  475176       1   0 07:13:11      –  0:00 hcimonitord -cl default.alrt -S prod_sms_22_adt
              hci  483436       1   0   Aug 31      – 25:21 hcimonitord -cl default.alrt -S p_lis_mock
              hci  491752       1   0 07:09:11      –  0:01 hcimonitord -cl default.alrt -S prod_mymda
              hci  590020       1   0 07:26:10      –  0:00 hcimonitord -cl default.alrt -S p_bldbnk
              hci  598138       1   1   Aug 31      – 25:30 hcimonitord -cl default.alrt -S p_sms_23_adt_mock
              hci  671828       1   0 07:10:10      –  0:00 hcimonitord -cl default.alrt -S p_g_adt_out
              hci  762106       1   0 07:02:11      –  0:01 hcimonitord -cl default.alrt -S prod_emr
              hci  770074       1   0 07:19:10      –  0:00 hcimonitord -cl default.alrt -S p_g2_adt_out
              hci  782350       1   0   Aug 31      – 20:04 hcimonitord -cl default.alrt -S p_sched_out2
              hci  831678       1   1   Aug 31      – 20:41 hcimonitord -cl default.alrt -S p_golive_bedtrk
              hci  872638       1   0 07:05:10      –  0:01 hcimonitord -cl default.alrt -S p_cs_allergy
              hci  876720       1   0 07:04:11      –  0:01 hcimonitord -cl default.alrt -S prod_global_adt
              hci  901146       1   0 07:14:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_23_adt
              hci  905256       1   0 07:07:11      –  0:01 hcimonitord -cl default.alrt -S prod_genie
              hci  974980       1   0 07:03:10      –  0:01 hcimonitord -cl default.alrt -S prod_flat_adt
              hci 1020144       1   0   Aug 31      – 23:16 hcimonitord -cl default.alrt -S p_golive_maxsys
              hci 1114344       1   0   Aug 31      – 20:11 hcimonitord -cl default.alrt -S p_flat_adt_mock
              hci 1232976       1   0 07:12:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_21_adt
              hci 1343616       1   0 07:17:10      –  0:00 hcimonitord -cl default.alrt -S p_sched_out
              hci 1384496       1   0 07:08:10      –  0:01 hcimonitord -cl default.alrt -S prod_pharm
              hci 1454188       1   1 07:16:10      –  0:00 hcimonitord -cl default.alrt -S prod_sms_sched
              hci 1503290       1   0   Sep 08      – 11:08 hcimonitord -cl off.alrt -S p_g2_adt_out_mock
              hci 1511572       1   0   Aug 31      – 23:38 hcimonitord -cl default.alrt -S p_g_adt_out_mock
              hci 1556656       1   0 07:15:11      –  0:00 hcimonitord -cl default.alrt -S prod_sms_order
              hci 1638618       1   0   Aug 31      – 20:38 hcimonitord -cl default.alrt -S prod_cbord_mock
              hci 1646624       1   0 07:00:11      –  0:01 hcimonitord -cl default.alrt -S prod_cbord
              hci 1663160       1   0   Aug 31      – 23:13 hcimonitord -cl default.alrt -S p_sched_out_mock

          I see the p_g2_adt_out_mock site has the alerts turned off at this time.

          Russ Ross
          RussRoss318@gmail.com

        • #69088
          Tim Jipson
          Participant

            That is extremely helpful, thank you!!!

          • #69089
            Bob Richardson
            Participant

              Greetings Russ and company:

              I tried your illustrated command “ps -p monitord_pid -o %a” to get the args and no luck.  We are running AIX5.3 TL8 SP7 and CIS5.6.2.

              For example, executing >ps -p 307570 -o %a

                                 yields: COMMAND

                                            hcimonitord -S allina_prod3

              Maybe an AIX version “feature”?  (we recently went from SP3 to SP7 to migrate to IBM supported version of our disk hardware).

              Just for thought here.

              BobR

            • #69090
              Russ Ross
              Participant

                What you described I hope isn’t a SP7 issue.

                What you described will also happen when you start the monitord by launching the netmonitor from the IDE instead of doing something like

                hcisitectl -k m -s m -A “a=-cl ‘default.alrt'”

                By default when starting the monitord by launching the netmonitor from the IDE loads the default.alrt file even though it doesn’t show up in the args, which I’m not crazy about either but learned to live with it.

                When I don’t see any args I’ve been able to assume that the default.alrt file is loaded.

                I like to see the args so when this comes to my attention I simply run the turn_on_alerts.ksh script I posted.

                I’m still able to

                ps_md | grep off

                to see if any sites have their alerts turned off despite this inconsistancy.

                Another thing to note is interfaces in a site will remain running even if the monitord is killed without stopping the interfaces, but I don’t believe the alerts will continue to run.

                You could check for this condition if you wanted by looking for any process pid files and not finding

                $HCISITEDIR/exec/hcimonitord/pid

                Russ Ross
                RussRoss318@gmail.com

              • #69091
                Bob Richardson
                Participant

                  Russ:

                  You must be correct in your assumption:  our scripts stop/start the monitor and lock managers using the vanilla command syntax

                  >hcisitectl -K; hcisitectl -S

                  I am aware of the fuller syntax but our shop has not gone that far

                  at this point.

                  Just a caveat that unless other folks use that full syntax to start up their monitor daemons, it appears the “%a” option will not yield what alrt file was loaded.

                  See you at the conference?

              Viewing 4 reply threads
              • The forum ‘Cloverleaf’ is closed to new topics and replies.