Database Errors – need steps to purge Error db externally

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Database Errors – need steps to purge Error db externally

  • Creator
    Topic
  • #54040
    Lawrence Nelson
    Participant

    I was processing a massive VRL to VRL file via an XLATE. (file has 350,000 rows) on 5.8

    The process hung

    Lawrence Nelson
    System Architect - MaineHealth IT

Viewing 19 reply threads
  • Author
    Replies
    • #79943
      Lawrence Nelson
      Participant

      This is the current state –

      Command Issued: hcidbdump -r

      Command status:0

      Command output:

      Unable to initialize DBI, err = 2

                             C  

                             l T

                             a y F

                             s p w

      Created  Message Id     s e d Prio State Length Source          Dest



      – – – —-





      [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

      [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

      [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921

      DBUSERID is already being used

      C errno = 0: Success”

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79944
      David Coffey
      Participant

      hcidbdump is a powerful command, be warned!  You could try something like this:

      hcidbdump -e -L -c -D -F > ./log_file.txt

      ‘E’rror database, ‘L’ong format with ‘c’ontext, ‘D’elete after processing and ‘F’orce the delete action

      The output is piped to a file

    • #79945
      Henry Bauer
      Participant

      if you are trying to fix the error database use

      keybuild elog

      dchain elog

      Sorry this was my mistake in not defining the commands for recovery and error databases.

    • #79946
      Lawrence Nelson
      Participant

      Thank you all!

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79947
      Lawrence Nelson
      Participant

      Still getting this on the Error DB side –

      Command Issued: hcidbdump -e

      Command status:0

      Command output:

      Unable to initialize DBI, err = 2

                             C  

                             l T

                             a y F

                             s p w

      Created  Message Id     s e d Prio State Length Source          Dest



      – – – —-





      [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

      [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

      [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921

      DBUSERID is already being used

      C errno = 0: Success”

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79948
      Henry Bauer
      Participant

      Found this information hope it helps:

      UNIX

      lmclear -u TEST -mp

      On Unix it is easier to recover from this command.  No need to stop or shut down anything.

      WINDOWS

      This error is less forgiving on Windows systems.  To clear you must shut down all sites and stop the QDX 5.x service in the control panel.  Once the service is down you should perform a site cleanup.

    • #79949
      Lawrence Nelson
      Participant

      We log on with a single user ID.

      When I run this command I get this error

      [hci@nordx-clotest ~]$ lmclear -u XXXX -mp

      Lock Manager Clear Utility

      RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com

      Copyright (c) 1992-2008 Birdstep Technology, Inc.  All Rights Reserved.

      User XXXX does not exist in lock manager.

      How do I determine the user name that the command is looking for in the lock manager.

      Also – a fine point we’re on Linux (not specifically unix/aix)

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79950
      Henry Bauer
      Participant

      the user is just TEST

    • #79951
      Lawrence Nelson
      Participant

      [hci@nordx-clotest databases]$ lmclear -u TEST -mp

      Lock Manager Clear Utility

      RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com

      Copyright (c) 1992-2008 Birdstep Technology, Inc.  All Rights Reserved.

      User TEST does not exist in lock manager.

      [hci@nordx-clotest databases]$

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79952
      Lawrence Nelson
      Participant

      In the end I’ve cheated –

      I went to another – old empty site – and stole these files to replace the error db content –

      elog.dbd

      elogCtx.key

      elogCtx.dat

      elogM2k.dat

      elogMid.key

      elogMid.dat

      I gone this route to eliminate my frustration – where some other step has left the GUI lock manager in a Red state and I am unable to start to from the GUI or command line.

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79953
      Terry Kellum
      Participant

      If you don’t need any of the messages in the SITE, you can use an  

      hcidbinit -AC to destroy the database and rebuild it from scratch.

      It warns you that it’s dangerous, but in essence it clears out all messages that are in-flight and in the error db.  You must make sure that this is really what you want to do.

    • #79954
      Lawrence Nelson
      Participant

      Thanks for all your help. The problems on the DB have been solved – but I’m un able to resolve this issue –

      GUI lock manager in a Red state

      and I am unable to start to from the GUI or command line.

      The extra confusion on this for me is that I have 3 sites –

      site_test_nap

      site_test_orders

      site_test_results

      The lock managers on orders and results are fine –

      the site_test_nap is the issue.

      The problem I seem to be having is how to run the lmclear to process specifically on the one site – while ignoring the other 2.

      [/img]

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79955
      Terry Kellum
      Participant

      If there are actually 3 “SITES”, then there are 3 Monitor Daemons and 3 Lock Daemons.  This means that you pick “Server Change” in the GUI menu to switch between the three sites.

      If there truly are 3 “Cloverleaf Sites”, then you can rebuild the database for one without affecting the other 2.

      If you are referring to “Sites” as thread pairs within the same Cloverleaf Engine (Same Net Monitor screen…) then you will need to turn off your inbound and insure that there are no messages “in flight” before doing a hcidbinit -AC.  You may want to discuss this with support if you have ANY question about this.  -AC replaces your database with a brand new copy with nuthin’ in it.  It’s a drastic solution for a drastic problem.

    • #79956
      Lawrence Nelson
      Participant

      Thank you for your patience.

      The db issues are resolved.

      My problem is that with the turning on and off of the lock manager – something has gone awry and I’m not able to get the lock manager to turn back on – in a single site (of the 3 previously described).

      I probably should have started a new ticket, but this occurred while I was running the instructions provided – so I thought I’d ask.

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79957
      Russ Ross
      Participant

      The problem with start/stop lock manager you described would cause me to check if either of these 2 conditions exist

      1) the lock manager pid file exists but the lock manager is no longer running (in this case probably need to remove the pid file)

      Code:


      ls -l $HCISITEDIR/exec/hcilockmgr/pid
      ps -ef | grep `cat $HCISITEDIR/exec/hcilockmgr/pid`

      2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)

      Code:


      ls -l $HCISITEDIR/exec/hcilockmgr/pid
      ps -ef | grep $HCISITE | grep ” lm ”

      since the GUI is stuck or confused might run these from command prompt to see if it gives any information as to a problem

      Code:


      # kill lock manager
      hcisitectl -k l

      # start lock manager
      hcisitectl -s l

      Russ Ross
      RussRoss318@gmail.com

    • #79958
      Lawrence Nelson
      Participant

      Thanks for the response –

      I have 3 sites – each has a lock manager folder –

      For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808

      When I run this – from the same folder

      hci@nordx-clotest hcilockmgr]$ ps -ef | grep $HCISITE | grep ” lm “

      I get this returned  –

      hci      24829     1  0 Feb11 ?        00:00:02 lm -mp -u 500 -a lm_cis5.8_site_test_orders -z /opt/healthvision/cis5.8/integrator/site_test_orders/exec/databases/

      Which looks to be in an entirely separate site folder.

      The other commands returned this

      [hci@nordx-clotest hcilockmgr]$ hcisitectl -k l

      Warning: lock manager should not be killed prior to monitord

      Lockmgr    is running on pid 24829

      hcimonitord is running on pid 22568

      [hci@nordx-clotest hcilockmgr]$ hcisitectl -s l

      Lockmgr    is running on pid 24829

      hcimonitord is running on pid 22568

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79959
      Russ Ross
      Participant

      Okay now that you see that the lock manager for the trouble site (sites_test_orders) is running as process 6808, do you have the corresponding pid file?

      Code:

      ls -l /opt/healthvision/cis5.8/integrator/site_test_orders/exec/hcilockmgr/pid

      If not then this is situation 2) described in my earlier post.

      Code:

      2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)

      If the pid file exists then the contents of the pid file needs to match the process ID that is running and your screen output shows a different number of 24829.

      Hold on the grep also show a pid of 24829 so it is obvious now to me we aren’t effecitvely communicating becuase you think the pid is 6808.

      Code:

      For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808

      Okay let me add this, you have to set the site envirnment to the site of interest and the cd to a directory is not the important part of getting $HCISITEDIR set.

      If I’m the one confused please ignore and forgive me, if you want to call me off line I will be avaialbe after 10:30 AM CST.

      If both the pid file and lock manager exist and match then no need to investigate this possibility.

      Russ Ross
      RussRoss318@gmail.com

    • #79960
      Lawrence Nelson
      Participant

      Ok – My Lock Manager is fixed.

      I am not the creator of these sites – to my best know a previous co-worker – split the order site in 2 – note sure what steps they took to do this.

      The results site had no issues with it’s LM.

      I took a copy of the PID file  out of the Orders site and dropped it on the ‘NAP’ site and it immediately resolved the issue.

      Again – I thank you for your time on this – much appreciated.

      Lawrence Nelson
      System Architect - MaineHealth IT

    • #79961
      Russ Ross
      Participant

      Well that’s a new one on me.

      Having two identical pid files in different sites sounds like something to avoid but not knowing how the site was split leaves me baffeled, too.

      I might even find myself wondering if I should blow it away and start from scratch to get to a known and trusted situation.

      Russ Ross
      RussRoss318@gmail.com

    • #79962
      Terry Kellum
      Participant

      My standard procedure when it gets this bad is to consider the site to be ‘hosed’.  I shut everything down, get rid of all pid and semaphore files, hcidbinit -AC and then bring everything back up.  That winds up being a lot quicker than figuring out if I have the correct pid file or etc.

Viewing 19 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,126
Forums
28
Topics
9,295
Replies
34,439
Topic Tags
287
Empty Topic Tags
10