re: Alerts for Recovery DB

This topic has 8 replies, 6 voices, and was last updated 13 years, 3 months ago by Eric Fortenberry.

Creator

Topic
August 21, 2006 at 8:14 pm #48721
Mark Gathers
Participant
I’m looking for an alert that will trigger when too many messages are stuck in the Recovery Database. Any help/ideas would be appreciated.

Mark Gathers

WVUH Hospitals
Creator

Topic

Viewing 7 reply threads

Author

Replies
- August 22, 2006 at 5:00 pm #59481
  Anonymous
  Participant
  Mark,
  
  Assuming that you are utlizing the alert tool within QDX IDE, this can be set up by adding a new alert row to the existing ones.
  
  Click on append button.
  
  in the append dialog box, for Alert type -> select forward_count in drop down list.
  
  Select the thread_name for the source list.
  
  In the field adjacent to comparing ==, type the number of messages for the recovery database.
  
  in the action tab, select the alert method you wanted — either exec/notify.
  
  I would prefer using exec and put the following in the command field.
  
  mailx -s ‘%A’ yourname@name.com < /dev/null save it and test. Hope this helps. Reggie
- August 22, 2006 at 5:37 pm #59482
  Mark Gathers
  Participant
  Thanks Reggie,
  
  I wanted something that checks the whole Recovery Database and not by each thread. I decided to created a UNIX script and check the message count in the Recovery DB using the following command:
  
  mshcnt=`hcidbdump -r | wc -l | tr -d ” “`
  
  If the mshcnt is over a 1000, the script sends a warning alert message via paging and email to us. Works pretty good.
  
  I doing this because we neglected to cycle a process after deleting a route. Since the destination thread was deleted, the messages continued to build up in the Recovery DB without anyone knowing.
  
  Mark
- August 22, 2006 at 5:45 pm #59483
  Anonymous
  Participant
  Hi Mark,
  
  I agree with you.
  
  Here is the problem with using this command — hcidbdump —
  
  It talks to daemon and then also check the database.
  
  For some reason, if the command is not executed correctly, then there is a higher chances for Database to be corrupted.
  
  Thats the reason, I use the alert tool for monitoring it.
  
  For an instance hcidbump -f| wc -l
  
  Similarly hcidbdump -d | wc -l
  
  In the alert tool you can add alerts for each threads. In my case, I monitor counts 50 for each threads. If a count reach to 50 I know that destination thread is not processing the data.
  
  Thanks
  
  Reggie
- August 23, 2006 at 12:29 am #59484
  Richard Hart
  Participant
  Guys.
  
  We modified the ‘hciconnstatus’ script a few years ago to display the message count for outbound threads. This grabs the message queued information from the shared memory.
  
  We actualy use this script within our ‘monitoring’ to alert on thread down/disconnect and queue information.
  
  eg
  
  Process Connection State Proto Status Count Started
  
  top_prod_rp ahs_prod_rp_adt_out up up 0 27/04/06 15:39:06
  
  top_prod_rp ccm_prod_rp_adt_out up up 0 27/04/06 15:39:07
  
  top_prod_rp cdc_prod_rp_adt_out up up 0 27/04/06 15:51:10
  
  top_prod_rp cdr_prod_rp_adt_out up up 0 27/04/06 15:39:08
  
  top_prod_rp cwb_prod_rp_adt_out up up 0 27/04/06 15:39:09
  
  top_prod_rp eds_prod_rp_adt_out up up 0 27/04/06 15:39:11
  
  top_prod_rp har_prod_rp_adt_out up up 0 27/04/06 15:39:12
  
  top_prod_rp hmh_prod_rp_adt_out up up 0 14/06/06 15:50:00
  
  top_prod_rp ris_prod_rp_adt_out up up 0 27/04/06 15:39:15
  
  top_prod_rp sol_prod_rp_adt_out up up 0 27/04/06 15:39:16
  
  top_prod_rp sud_prod_rp_adt_out up up 0 27/04/06 15:39:21
  
  top_prod_rp top_prod_rp_adt_rcv up up 0 27/04/06 15:39:17
  
  top_prod_rp top_prod_rp_adt_snd up up 0 27/04/06 15:39:19
  
  top_prod_rp ult_prod_rp_adt_out up up 0 25/05/06 12:52:52
- April 5, 2012 at 6:04 pm #59485
  Daniel Lee
  Participant
  Since the last update on this topic was posted in 2006 I thought I’d bump this to see if 5.8 has a solution for this.
  
  I’m trying to figure out if I can put some type of alert on the recovery DB where if it exceeds a given count of messages in the DB an alert is triggered. Please note, I do not want an alert for each thread but instead just want one for the whole recovery DB which would indicate that our whole engine is slowing down. (or extreamly busy)
- April 6, 2012 at 12:35 pm #59486
  Rob Lindsey
  Participant
  There is the “outbound queue depth” alert available that you could setup for each of the interfaces you want to montior. Of course this could get to be a huge job depending on how many sites and threads.
  
  Since we monitor almost 800 threads, I wrote a SHELL program to loop through each site and call a tcl program that goes through the MSI area of the site and pulls out the number of messages waiting and then creates a report that is emailed. Of course I only report those interfaces that have data waiting and that is older than 2 hours old.
- April 6, 2012 at 1:06 pm #59487
  Daniel Lee
  Participant
  I’m looking more for a collective of all threads instead of individual threads. We do not have an operations staff monitoring the interfaces and rely on the alerts. I do not want to get a page in the middle of the night if I have one interface with 100 messages backed up but I do want to get a page if the total of all messages in the recovery database is over 2000. To me this would be the equivilant of an operator calling me in the night to say “the NetMonitor is lit up like a Christmas Tree, please help”. 🙂 I’ve thought about writting a script but after researching it seems unsafe to write a script to use a dbdump on the recovery database.
- April 6, 2012 at 6:51 pm #59488
  Eric Fortenberry
  Participant
  I had written a script to pull counts from the recovery database using hcidbdump. I never saw any corruption, but it did cause an I/O bottleneck whenever the number of messages in the recovery database got high.
  
  As a solution, I ended up using msiAttch to pull the statistics of pending messages. Here is a slightly modified version of that script that should meet your needs. https://gist.github.com/2322016
  
  Let me know what you think.
  
  Thanks,
  
  Eric
Author

Replies

Viewing 7 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.