Alert questions

Clovertech Forums Cloverleaf Alert questions

  • Creator
    Topic
  • #122440
    Jason Russell
    Participant

      So we’re finally digging into alerts hard. However, there are some questions we have. We’re on 2022.09 and on RHEL 8.10.

      First global vs local alerts. It seems there are a few ‘global’ alerts that can be set. Most of these are around system processes (disk space, cpu, etc). It also seems that you can alert on processes globally  as well. Is this accurate. When selecting the ‘source’ for process status, you get a list of ALL processes. Does this actually function globally?

      With that said, a few variables and interactions I have questions about. When calling a tcl script, according to the documentation and the sample script (which makes NO sense), there is a ‘return’ value. Is this returned into %A? IE: if I run a tcl script that does some additional checking, and return a value, does that get put into %A? It’s really unclear what happens there.

      %P and %T vs %p and %t. So according to the documentation, %P and %T return a LIST of triggering variables (comma separated because reasons), However, for %p and %t, the documentation says:  ‘If T3 is “dead”, then the alert is triggering again. %T is replaced with T2 and T3. %t is one of these two threads, if %t is replaced with T3, and %p is replaced with P3.’

      So if two threads trigger, %T will contain thread2,thread3, but %t says ‘one of these two threads’. How does it know which one? does it trigger it multiple times for each individual thread?

      The last one is more of a confirmation, but %R — The repetition value. Say a thread is down, and I pass in %R, it will increment until it is up, once it’s up and the alert goes off again, does it reset the counter?

    Viewing 4 reply threads
    • Author
      Replies
      • #122510
        Jason Russell
        Participant

          We’ve figured out most of the above questions. However, I’ve run into another interesting issue. We’re probably doing this ‘wrong’, but I’m trying to keep as much of the alerting in one place. We’re checking the status of the alerts via master (who can see all sites), however you can’t do any thing directly within the site from master (IE restarting the process). My intention was to run setsite and then hcienginerun. Initially the script couldn’t find setsite (not sure why), so I switched to a manual call to hcisetenv, but it’s now not finding hcienginerun (which is in the same folder, with same permissions): invalid command name “/opt/cloverleaf/cis2022.09/integrator/bin/hcienginerun”

          -rwxrwxr-x. 1 hci staff 12969 Oct 11 2024 hcienginerun
          -rwxrwxr-x. 1 hci staff 37153 Oct 11 2024 hcisetenv

          I’m calling hcisetenv first and it seems to work just fine. Has anyone seen this?

        • #122512
          David Barr
          Participant

            You may be calling hcisetenv incorrectly. Here’s an example of a script that works:

            #!/usr/bin/ksh
            export CL_INSTALL_DIR=/opt/cloverleaf/cis2022.09
            export HCIVERSION=2022.09
            export HCIROOT=$CL_INSTALL_DIR/integrator
            export FPATH=$HCIROOT/kshlib
            setroot $HCIROOT
            setsite live_ord_res
            hcicmd -p cprof -c “cprof_out pstop”
            hcicmd -p cprof -c “cprof_out_qa pstop”

          • #122516
            Jason Russell
            Participant

              For some reason, the tcl script won’t call setsite. Maybe because of setroot. Let me modify and try that.

               

            • #122517
              Jason Russell
              Participant

                Was a bit rushed yesterday and just now looked at your script, and it’s a KSH script (which is fine) but we were doing this via TCL through the alerts section, which is probably why some of these scripts are not working. Still interesting, going to try and change tactics a bit.

                 

              • #122525
                Jason Russell
                Participant

                  With all of that said above, these are the things I’ve tried, and none seem to work. This is called from a master site alerts window, calling a TCL proc. It doesn’t throw errors like it did before, but it just doens’t seem to do anything. No engine logs in any of the sites that I could tell.

                  set callPath “$::env(HCIROOT)/bin”
                  #exec “$callPath/hcisetenv -site $::site; $callPath/hcienginerun -p $process”
                  #exec $::env(HCIROOT)/kshlib/setroot
                  #exec $::env(HCIROOT)/kshlib/setsite $::site
                  #exec $::env(HCIROOT)/bin/hcienginerun -p $process
                  set setroot “$::env(HCIROOT)/kshlib/setroot”
                  set setsite “$::env(HCIROOT)/kshlib/setsite $::site”
                  set engRun “$::env(HCIROOT)/bin/hcienginerun -p $process”
                  exec “$setroot; $setsite; $engRun”
                  #setsite $::site
                  #hcienginerun -p $process

              Viewing 4 reply threads
              • You must be logged in to reply to this topic.