How big is your Recovery / Error Db?

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf How big is your Recovery / Error Db?

  • Creator
    Topic
  • #51292
    Troy Morton
    Participant

      Hi everyone,

      I was just wondering what everyone’s standard is for monitoring the size of their Recovery and Error Db files in $HCISITEDIR/exec/databases.  I know that its good to reinitialize your databases periodically, but how do you determine when to do this?

      Here is a script that I wrote which will echo the size (in MB) of the Recovery and Error Db for every site in the current $HCIROOT.  It reads a list of sites to check from the file $HCIROOT/prodsites so that you can choose to exclude non-active sites and SiteProto.

      Code:


      #!/usr/bin/ksh

      function  showrecoverydbsizes {

      echo “$1”

      echo “n Recovery/Error Database sizesn  `date`n            `hostname`n”
      echo “Site RecoveryDb ErrorDb” | awk ‘{ printf (”%-10s %10s %9sn”,$1,$2,$3) }’
      echo “——– ———- ———” | awk ‘{ printf (”%-10s %10s %9sn”,$1,$2,$3) }’

      for mysite in `cat $HCIROOT/prodsites`
      do
       
       # echo and setsite to check
       #echo “Checking site: $mysite”
       setsite $mysite
       
       # Verify the setsite command was successful
       sitecheck=`showroot | grep “HCI site” | awk ‘{print $4}’`
       if [[ $sitecheck != $mysite ]]
       then
        echo “nSetsite Command was unsuccessfull. Aborting Script.n”
          return
       fi

       rdbsize1=`ls -al $HCISITEDIR/exec/databases/rlogMid.dat | awk ‘{ print $5 }’`
       rdbsize2=`ls -al $HCISITEDIR/exec/databases/rlogM2k.dat | awk ‘{ print $5 }’`
       
       edbsize1=`ls -al $HCISITEDIR/exec/databases/elogMid.dat | awk ‘{ print $5 }’`
       edbsize2=`ls -al $HCISITEDIR/exec/databases/elogM2k.dat | awk ‘{ print $5 }’`
       
       rdbsize=$((($rdbsize1+$rdbsize2)/2000000))
       edbsize=$((($edbsize1+$edbsize2)/2000000))
       
       echo “$mysite $rdbsize  $edbsize” | awk ‘{ printf (”%-10s %7d MB %6d MBn”,$1,$2,$3) }’

      done

      echo “nDone.n”

      }

      # execute main function
      showrecoverydbsizes
      unset showrecoverydbsizes mysite sitecheck rdbsize1 rdbsize2 edbsize1 edbsize2 rdbsize edbsize

      Sample output:

      Code:


      hsprim1b>ShowDbSizes.sh

      Recovery/Error Database sizes
       Thu Oct 29 11:41:17 CDT 2009
                 hsprim1b

      Site       RecoveryDb   ErrorDb
      ——–   ———- ———
      tds              0 MB      0 MB
      lab              6 MB      1 MB
      shec             9 MB      0 MB
      seb             23 MB      0 MB
      sns              5 MB      0 MB
      sms              2 MB      1 MB
      sjcf             3 MB      0 MB
      sjb              1 MB      0 MB
      smgb             6 MB      0 MB
      svgb             0 MB      0 MB
      sjh              1 MB      0 MB
      msae            29 MB      1 MB
      mctyaie         17 MB      0 MB
      mctydcfh        60 MB     16 MB
      mctykbg         18 MB      0 MB
      mctymjl         21 MB      0 MB
      msmd            42 MB      1 MB
      pem             41 MB      0 MB
      msjsf1           3 MB      0 MB
      msjsf2           8 MB      0 MB
      msjst1          53 MB      0 MB
      msjst2          59 MB      0 MB
      msfl            11 MB      0 MB

      Done.

    Viewing 5 reply threads
    • Author
      Replies
      • #69566

        Troy, we (at Boone) keep our recovery (and error) databases empty. That is, we resolve any issues asap that cause message to stay in the recovery (or error) databases. I have a cron script that runs every so often and reports the existence of message in the databases.

        -- Max Drown (Infor)

      • #69567
        Troy Morton
        Participant

          We don’t keep any messages in there either, but even if you delete them, the database files get bigger since the records are only “marked” deleted.

          Sorry I didn’t really make that clear.

          Eventually, the files do grow and because the engine has to seek through more of the file to write/remove records, it slows down processing and according to what I’ve heard, can even make your processes panic if the files are too large.

          Here is an exmpale of a freshly init-ed database:

          Code:

          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 elogCtx.dat
          -rw-rw-r–   1 hci      staff          2048 Oct 29 13:25 elogCtx.key
          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 elogM2k.dat
          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 elogMid.dat
          -rw-rw-r–   1 hci      staff          2048 Oct 29 13:25 elogMid.key
          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 iclThreads.dat
          -rw-rw-r–   1 hci      staff          2048 Oct 29 13:25 iclThreads.key
          -rw-rw-r–   1 hci      staff          2048 Oct 29 13:25 rlogIdent.dat
          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 rlogM2k.dat
          -rw-rw-r–   1 hci      staff          1024 Oct 29 13:25 rlogMid.dat
          -rw-rw-r–   1 hci      staff          2048 Oct 29 13:25 rlogMid.key

          Here is a database from a site that’s been runing without a re-init for about a year or so:

          Code:

          -rw-rw-r–   1 hci      staff       2704384 Oct 23 13:50 elogCtx.dat
          -rw-rw-r–   1 hci      staff        193536 Oct 23 13:50 elogCtx.key
          -rw-rw-r–   1 hci      staff      15407104 Oct 23 13:50 elogM2k.dat
          -rw-rw-r–   1 hci      staff      17752064 Oct 23 13:50 elogMid.dat
          -rw-rw-r–   1 hci      staff        193536 Oct 23 13:50 elogMid.key
          -rw-rw-r–   1 hci      staff         91136 Oct 27 09:23 iclThreads.dat
          -rw-rw-r–   1 hci      staff         10240 Oct 27 09:23 iclThreads.key
          -rw-rw-r–   1 hci      staff          2048 Dec  1 2005  rlogIdent.dat
          -rw-rw-r–   1 hci      staff      47828992 Oct 29 13:25 rlogM2k.dat
          -rw-rw-r–   1 hci      staff      73894912 Oct 29 13:25 rlogMid.dat
          -rw-rw-r–   1 hci      staff       1203200 Oct 29 13:25 rlogMid.key

        • #69568

          Well shoot, I see that is the case on my server, too. Thanks for bringing it up.

          -- Max Drown (Infor)

        • #69569
          Sam Craig
          Participant

            Well, that is interesting…. 😯

            Also, to reinitialize the Recovery databases, you have to be very careful.

            Right?

            Seems messy and makes for more work and undesired downtime.

          • #69570
            Troy Morton
            Participant

              Yes.

            • #69571
              Chris Williams
              Participant

                The recovery/error databases will reuse the space that was taken by messages long since gone, but the databases themselves will not contract. They only expand. To reign them in, you initialize them. The db size is going to be somewhat proportional to the maximum quantity of messages in play at any given moment. Where ours take a hit is when we lose a connection to a destination system, and a large quantity of messages queue up in the engine.

            Viewing 5 reply threads
            • The forum ‘Cloverleaf’ is closed to new topics and replies.