Unable to run process even after running hcidbinit

    Ivan Ng

    I got the following error when starting any of the processes on my site.  I suspect that the DB was corrupted during a power loss.  I tried hcidbinit -A -C and it is still not working.  By the way, what else can be done to recover the information in the db prior to doing hcidbinit?

    Thank you in advance.

    [prod:prod:INFO/0:  STARTUP_TID] Started at Wed Nov 29 15:14:11 2006

    [prod:prod:INFO/0:  STARTUP_TID] Engine process is 4590 on host clover1

    11/29/2006 15:14:21

    [cmd :cmd :ERR /0:csmicb_o1_cmd] Initialization of DBI failed

    11/29/2006 15:14:21

    [prod:prod:ERR /0:csmicb_o1_cmd] Unable to initialize the Command Thread.

    PANIC: “0”

    PANIC: Calling “pti” for thread csmicb_o1_cmd

    Scheduler State

    Thread Events     State      Priority Runnable  PT Msgs

      0      0   SCHED_IDLE         0       0       0,0,0

    Thread 0

    ti: 0x40160928

       tid           :    0

       HostPthreadId : 0x00000001

       EventList     : 0x40076ac8

       PolledEvents  : 0x40076b10

       PthreadEvent  : 0x4016a898

       ReadyEvents   : 0x40076b28

       CtrlMsgs      : 0x40076b40

       UserCtrlMsgs  : 0x40076b58

       UserDataMsgs  : 0x40076b70

       StartArgs     : 0x00000000

       SchedState    : SCHED_IDLE

       SchedPriority : 0

       Killed        : 0

    Registered Events

    el: 0x40076ac8

       elCount : 1

       elHead: 0x40080c70

       elTail: 0x40080c70

    ele: 0x40080c70

       event: 0x4016a898

       prev : 0x0

       next : 0x0

    ev: 0x4016a898

        evType     : PTHREADS

        evStrDesc    :

        evSocket     : 0

        evMsgQue     : 0

        evTid        : 0

        evState      : 0

        evPtMsg      : 0x0

        evUserData   : 0x0

        evCallBack   : 0x0

        evCbShutdown : 0x0

        evRecurFreq  :

    Polled Events

    el: 0x40076b10

       elCount : 0

       elHead: 0x00000000

       elTail: 0x00000000

    Ready Events

    el: 0x40076b28

       elCount : 0

       elHead: 0x00000000

       elTail: 0x00000000

    Outstanding Pthread Ctrl Msgs

    pmq: 0x40076b40

    Count   : 0

    Head    : 0x00000000

    Tail    : 0x00000000

    Outstanding Pthread User Ctrl Msgs

    pmq: 0x40076b58

    Count   : 0

    Head    : 0x00000000

    Tail    : 0x00000000

    Outstanding Pthread User Data Msgs

    pmq: 0x40076b70

    Count   : 0

    Head    : 0x00000000

    Tail    : 0x00000000

    PANIC: Process panic—engine going down

    PANIC: assertion ‘0’ failed at main.cpp/236

      Russ Ross

      I don’t want to get your hopes up because a corrupted recovery database due to a power loss is not something I know how to recover from without message loss.

      If someone knows the answer please share!

      However, I would take a look in the directory

      $HCISITEDIR/databases; ls -al

      to see if there is a left over database lock flag and get rid of it.

      I highlighted the lock flag I’m refering to in the attached screen shot.

      Note: be sure to do “ls -al” and not “ls -l”.

      Russ Ross

      Robert Kersemakers

      Hi Ivan,

      Here’s what we do when we have to restart Cloverleaf:

      * Stop all processes

      * Stop all site daemons (hcisitectl -f -K)

      * Remove the monitorshmemfile (rm $HCISITEDIR/exec/monitorShmemfile)

      * Cleanup memory region (hcimsiutil -R)

      * Remove vista.taf (rm $HCISITEDIR/exec/databases/vista.taf)

      * Cleanup site (hcisitecleanup)

      * Initialise database (hcidbinit -iC)

      * If necessary (when having DB_vista error), rebuild logs:

         – keybuild rlog

         – dchain rlog

         – keybuild elog

         – dchain elog

      * Start site daemons (hcisitectl -S)

      * Start all processes

      If this fails, we reboot the server. If it still fails, we call support…

      Hope this helps.

      Zuyderland Medisch Centrum; Heerlen/Sittard; The Netherlands

      Ivan Ng

      Robert, thx, while that does not get the site in problem to work, your list is very useful for future issues.

      Nathan Martin

      Just a thought.  Make sure you’re running as the right user, with the correct permissions on all the directories, and with the correct setroot/showroot settings when you do the dbinit and start the process.

      In addition to what Robert said, there may be a command to cleanup after crashes.  That would be:  hcisitecleanup

      Also, call the Quovadx support number if you haven’t.

