How big is your Recovery / Error Db?

This topic has 6 replies, 4 voices, and was last updated 15 years, 8 months ago by Chris Williams.

Creator

Topic
October 29, 2009 at 4:39 pm #51292
Troy Morton
Participant
Hi everyone,

I was just wondering what everyone’s standard is for monitoring the size of their Recovery and Error Db files in $HCISITEDIR/exec/databases. I know that its good to reinitialize your databases periodically, but how do you determine when to do this?

Here is a script that I wrote which will echo the size (in MB) of the Recovery and Error Db for every site in the current $HCIROOT. It reads a list of sites to check from the file $HCIROOT/prodsites so that you can choose to exclude non-active sites and SiteProto.

Code: #!/usr/bin/ksh function showrecoverydbsizes { echo “$1” echo “n Recovery/Error Database sizesn `date`n `hostname`n” echo “Site RecoveryDb ErrorDb” | awk ‘{ printf (”%-10s %10s %9sn”,$1,$2,$3) }’ echo “——– ———- ———” | awk ‘{ printf (”%-10s %10s %9sn”,$1,$2,$3) }’ for mysite in `cat $HCIROOT/prodsites` do # echo and setsite to check #echo “Checking site: $mysite” setsite $mysite # Verify the setsite command was successful sitecheck=`showroot | grep “HCI site” | awk ‘{print $4}’` if [[ $sitecheck != $mysite ]] then echo “nSetsite Command was unsuccessfull. Aborting Script.n” return fi rdbsize1=`ls -al $HCISITEDIR/exec/databases/rlogMid.dat | awk ‘{ print $5 }’` rdbsize2=`ls -al $HCISITEDIR/exec/databases/rlogM2k.dat | awk ‘{ print $5 }’` edbsize1=`ls -al $HCISITEDIR/exec/databases/elogMid.dat | awk ‘{ print $5 }’` edbsize2=`ls -al $HCISITEDIR/exec/databases/elogM2k.dat | awk ‘{ print $5 }’` rdbsize=$((($rdbsize1+$rdbsize2)/2000000)) edbsize=$((($edbsize1+$edbsize2)/2000000)) echo “$mysite $rdbsize $edbsize” | awk ‘{ printf (”%-10s %7d MB %6d MBn”,$1,$2,$3) }’ done echo “nDone.n” } # execute main function showrecoverydbsizes unset showrecoverydbsizes mysite sitecheck rdbsize1 rdbsize2 edbsize1 edbsize2 rdbsize edbsize

Sample output:

Code: hsprim1b>ShowDbSizes.sh Recovery/Error Database sizes Thu Oct 29 11:41:17 CDT 2009 hsprim1b Site RecoveryDb ErrorDb ——– ———- ——— tds 0 MB 0 MB lab 6 MB 1 MB shec 9 MB 0 MB seb 23 MB 0 MB sns 5 MB 0 MB sms 2 MB 1 MB sjcf 3 MB 0 MB sjb 1 MB 0 MB smgb 6 MB 0 MB svgb 0 MB 0 MB sjh 1 MB 0 MB msae 29 MB 1 MB mctyaie 17 MB 0 MB mctydcfh 60 MB 16 MB mctykbg 18 MB 0 MB mctymjl 21 MB 0 MB msmd 42 MB 1 MB pem 41 MB 0 MB msjsf1 3 MB 0 MB msjsf2 8 MB 0 MB msjst1 53 MB 0 MB msjst2 59 MB 0 MB msfl 11 MB 0 MB Done.
Creator

Topic

Viewing 5 reply threads

Author

Replies
- October 29, 2009 at 5:48 pm #69566
  Max Drown (Infor)
  Keymaster
  Troy, we (at Boone) keep our recovery (and error) databases empty. That is, we resolve any issues asap that cause message to stay in the recovery (or error) databases. I have a cron script that runs every so often and reports the existence of message in the databases.
  
  -- Max Drown (Infor)
- October 29, 2009 at 6:23 pm #69567
  Troy Morton
  Participant
  We don’t keep any messages in there either, but even if you delete them, the database files get bigger since the records are only “marked” deleted.
  
  Sorry I didn’t really make that clear.
  
  Eventually, the files do grow and because the engine has to seek through more of the file to write/remove records, it slows down processing and according to what I’ve heard, can even make your processes panic if the files are too large.
  
  Here is an exmpale of a freshly init-ed database:
  
  Code: -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 elogCtx.dat -rw-rw-r– 1 hci staff 2048 Oct 29 13:25 elogCtx.key -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 elogM2k.dat -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 elogMid.dat -rw-rw-r– 1 hci staff 2048 Oct 29 13:25 elogMid.key -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 iclThreads.dat -rw-rw-r– 1 hci staff 2048 Oct 29 13:25 iclThreads.key -rw-rw-r– 1 hci staff 2048 Oct 29 13:25 rlogIdent.dat -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 rlogM2k.dat -rw-rw-r– 1 hci staff 1024 Oct 29 13:25 rlogMid.dat -rw-rw-r– 1 hci staff 2048 Oct 29 13:25 rlogMid.key
  
  Here is a database from a site that’s been runing without a re-init for about a year or so:
  
  Code: -rw-rw-r– 1 hci staff 2704384 Oct 23 13:50 elogCtx.dat -rw-rw-r– 1 hci staff 193536 Oct 23 13:50 elogCtx.key -rw-rw-r– 1 hci staff 15407104 Oct 23 13:50 elogM2k.dat -rw-rw-r– 1 hci staff 17752064 Oct 23 13:50 elogMid.dat -rw-rw-r– 1 hci staff 193536 Oct 23 13:50 elogMid.key -rw-rw-r– 1 hci staff 91136 Oct 27 09:23 iclThreads.dat -rw-rw-r– 1 hci staff 10240 Oct 27 09:23 iclThreads.key -rw-rw-r– 1 hci staff 2048 Dec 1 2005 rlogIdent.dat -rw-rw-r– 1 hci staff 47828992 Oct 29 13:25 rlogM2k.dat -rw-rw-r– 1 hci staff 73894912 Oct 29 13:25 rlogMid.dat -rw-rw-r– 1 hci staff 1203200 Oct 29 13:25 rlogMid.key
- October 29, 2009 at 6:32 pm #69568
  Max Drown (Infor)
  Keymaster
  Well shoot, I see that is the case on my server, too. Thanks for bringing it up.
  
  -- Max Drown (Infor)
- October 29, 2009 at 7:51 pm #69569
  Sam Craig
  Participant
  Well, that is interesting…. 😯
  
  Also, to reinitialize the Recovery databases, you have to be very careful.
  
  Right?
  
  Seems messy and makes for more work and undesired downtime.
- October 29, 2009 at 7:59 pm #69570
  Troy Morton
  Participant
  Yes.
- October 29, 2009 at 9:35 pm #69571
  Chris Williams
  Participant
  The recovery/error databases will reuse the space that was taken by messages long since gone, but the databases themselves will not contract. They only expand. To reign them in, you initialize them. The db size is going to be somewhat proportional to the maximum quantity of messages in play at any given moment. Where ours take a hit is when we lose a connection to a destination system, and a large quantity of messages queue up in the engine.
Author

Replies

Viewing 5 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.