re: Alerts for Recovery DB

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf re: Alerts for Recovery DB

  • Creator
    Topic
  • #48721
    Mark Gathers
    Participant

      I’m looking for an alert that will trigger when too many messages are stuck in the Recovery Database.  Any help/ideas would be appreciated.

      Mark Gathers

      WVUH Hospitals

    Viewing 7 reply threads
    • Author
      Replies
      • #59481
        Anonymous
        Participant

          Mark,

          Assuming that you are utlizing the alert tool within QDX IDE, this can be set up by adding a new alert row to the existing ones.

          Click on append button.

          in the append dialog box, for Alert type ->  select forward_count in drop down list.

          Select the thread_name for the source list.

          In the field adjacent to comparing ==, type the number of messages for the recovery database.

          in the action tab, select the alert method you wanted — either exec/notify.

          I would prefer using exec and put the following in the command field.

          mailx -s ‘%A’ yourname@name.com < /dev/null save it and test. Hope this helps. Reggie

        • #59482
          Mark Gathers
          Participant

            Thanks Reggie,

            I wanted something that checks the whole Recovery Database and not by each thread.  I decided to created a UNIX script and check the message count in the Recovery DB using the following command:

            mshcnt=`hcidbdump -r | wc -l | tr -d ” “`  

            If the mshcnt is over a 1000, the script sends a warning alert message via paging and email to us.  Works pretty good.

            I doing this because we neglected to cycle a process after deleting a route.  Since the destination thread was deleted, the messages continued to build up in the Recovery DB without anyone knowing.

            Mark

          • #59483
            Anonymous
            Participant

              Hi Mark,

              I agree with you.

              Here is the problem with using this command — hcidbdump —

              It talks to daemon and then also check the database.

              For some reason, if the command is not executed correctly, then there is a higher chances for Database to be  corrupted.

              Thats the reason, I use the alert tool for monitoring it.

              For an instance hcidbump -f| wc -l

              Similarly hcidbdump -d | wc -l

              In the alert tool you can add alerts for each threads. In my case, I monitor counts 50 for each threads. If a count reach to 50 I know that destination thread is not processing the data.

              Thanks

              Reggie

            • #59484
              Richard Hart
              Participant

                Guys.

                We modified the ‘hciconnstatus’ script a few years ago to display the message count for outbound threads.   This grabs the message queued information from the shared memory.

                We actualy use this script within our ‘monitoring’ to alert on thread down/disconnect and queue information.

                eg

                Process      Connection           State Proto Status Count  Started            


                 


                 





                 

                top_prod_rp  ahs_prod_rp_adt_out  up    up           0      27/04/06 15:39:06  

                top_prod_rp  ccm_prod_rp_adt_out  up    up           0      27/04/06 15:39:07  

                top_prod_rp  cdc_prod_rp_adt_out  up    up           0      27/04/06 15:51:10  

                top_prod_rp  cdr_prod_rp_adt_out  up    up           0      27/04/06 15:39:08  

                top_prod_rp  cwb_prod_rp_adt_out  up    up           0      27/04/06 15:39:09  

                top_prod_rp  eds_prod_rp_adt_out  up    up           0      27/04/06 15:39:11  

                top_prod_rp  har_prod_rp_adt_out  up    up           0      27/04/06 15:39:12  

                top_prod_rp  hmh_prod_rp_adt_out  up    up           0      14/06/06 15:50:00  

                top_prod_rp  ris_prod_rp_adt_out  up    up           0      27/04/06 15:39:15  

                top_prod_rp  sol_prod_rp_adt_out  up    up           0      27/04/06 15:39:16  

                top_prod_rp  sud_prod_rp_adt_out  up    up           0      27/04/06 15:39:21  

                top_prod_rp  top_prod_rp_adt_rcv  up    up           0      27/04/06 15:39:17  

                top_prod_rp  top_prod_rp_adt_snd  up    up           0      27/04/06 15:39:19  

                top_prod_rp  ult_prod_rp_adt_out  up    up           0      25/05/06 12:52:52

              • #59485
                Daniel Lee
                Participant

                  Since the last update on this topic was posted in 2006 I thought I’d bump this to see if 5.8 has a solution for this.

                  I’m trying to figure out if I can put some type of alert on the recovery DB where if it exceeds a given count of messages in the DB an alert is triggered.  Please note, I do not want an alert for each thread but instead just want one for the whole recovery DB which would indicate that our whole engine is slowing down.  (or extreamly busy)

                • #59486
                  Rob Lindsey
                  Participant

                    There is the “outbound queue depth” alert available that you could setup for each of the interfaces you want to montior.  Of course this could get to be a huge job depending on how many sites and threads.

                    Since we monitor almost 800 threads, I wrote a SHELL program to loop through each site and call a tcl program that goes through the MSI area of the site and pulls out the number of messages waiting and then creates a report that is emailed.  Of course I only report those interfaces that have data waiting and that is older than 2 hours old.

                  • #59487
                    Daniel Lee
                    Participant

                      I’m looking more for a collective of all threads instead of individual threads.  We do not have an operations staff monitoring the interfaces and rely on the alerts.  I do not want to get a page in the middle of the night if I have one interface with 100 messages backed up but I do want to get a page if the total of all messages in the recovery database is over 2000.  To me this would be the equivilant of an operator calling me in the night to say “the NetMonitor is lit up like a Christmas Tree, please help”. 🙂  I’ve thought about writting a script but after researching it seems unsafe to write a script to use a dbdump on the recovery database.

                    • #59488
                      Eric Fortenberry
                      Participant

                        I had written a script to pull counts from the recovery database using hcidbdump.  I never saw any corruption, but it did cause an I/O bottleneck whenever the number of messages in the recovery database got high.

                        As a solution, I ended up using msiAttch to pull the statistics of pending messages.  Here is a slightly modified version of that script that should meet your needs.  https://gist.github.com/2322016

                        Let me know what you think.

                        Thanks,

                        Eric

                    Viewing 7 reply threads
                    • The forum ‘Cloverleaf’ is closed to new topics and replies.