› Clovertech Forums › Read Only Archives › Cloverleaf › Cloverleaf › Database Errors – need steps to purge Error db externally
The process hung
Lawrence Nelson
System Architect - MaineHealth IT
This is the current state –
Command Issued: hcidbdump -r
Command status:0
Command output:
Unable to initialize DBI, err = 2
C
l T
a y F
s p w
Created Message Id s e d Prio State Length Source Dest
[0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
‘
[0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
‘
[0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921
DBUSERID is already being used
C errno = 0: Success”
‘
Lawrence Nelson
System Architect - MaineHealth IT
hcidbdump is a powerful command, be warned! You could try something like this:
hcidbdump -e -L -c -D -F > ./log_file.txt
‘E’rror database, ‘L’ong format with ‘c’ontext, ‘D’elete after processing and ‘F’orce the delete action
The output is piped to a file
if you are trying to fix the error database use
keybuild elog
dchain elog
Sorry this was my mistake in not defining the commands for recovery and error databases.
Thank you all!
Lawrence Nelson
System Architect - MaineHealth IT
Still getting this on the Error DB side –
Command Issued: hcidbdump -e
Command status:0
Command output:
Unable to initialize DBI, err = 2
C
l T
a y F
s p w
Created Message Id s e d Prio State Length Source Dest
[0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
‘
[0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
‘
[0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921
DBUSERID is already being used
C errno = 0: Success”
‘
Lawrence Nelson
System Architect - MaineHealth IT
Found this information hope it helps:
UNIX
lmclear -u TEST -mp
On Unix it is easier to recover from this command. No need to stop or shut down anything.
WINDOWS
This error is less forgiving on Windows systems. To clear you must shut down all sites and stop the QDX 5.x service in the control panel. Once the service is down you should perform a site cleanup.
We log on with a single user ID.
When I run this command I get this error
[hci@nordx-clotest ~]$ lmclear -u XXXX -mp
Lock Manager Clear Utility
RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com
Copyright (c) 1992-2008 Birdstep Technology, Inc. All Rights Reserved.
User XXXX does not exist in lock manager.
How do I determine the user name that the command is looking for in the lock manager.
Also – a fine point we’re on Linux (not specifically unix/aix)
Lawrence Nelson
System Architect - MaineHealth IT
the user is just TEST
[hci@nordx-clotest databases]$ lmclear -u TEST -mp
Lock Manager Clear Utility
RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com
Copyright (c) 1992-2008 Birdstep Technology, Inc. All Rights Reserved.
User TEST does not exist in lock manager.
[hci@nordx-clotest databases]$
Lawrence Nelson
System Architect - MaineHealth IT
In the end I’ve cheated –
I went to another – old empty site – and stole these files to replace the error db content –
elog.dbd
elogCtx.key
elogCtx.dat
elogM2k.dat
elogMid.key
elogMid.dat
I gone this route to eliminate my frustration – where some other step has left the GUI lock manager in a Red state and I am unable to start to from the GUI or command line.
Lawrence Nelson
System Architect - MaineHealth IT
If you don’t need any of the messages in the SITE, you can use an
hcidbinit -AC to destroy the database and rebuild it from scratch.
It warns you that it’s dangerous, but in essence it clears out all messages that are in-flight and in the error db. You must make sure that this is really what you want to do.
Thanks for all your help. The problems on the DB have been solved – but I’m un able to resolve this issue –
GUI lock manager in a Red state
and I am unable to start to from the GUI or command line.
The extra confusion on this for me is that I have 3 sites –
site_test_nap
site_test_orders
site_test_results
The lock managers on orders and results are fine –
the site_test_nap is the issue.
The problem I seem to be having is how to run the lmclear to process specifically on the one site – while ignoring the other 2.
[/img]
Lawrence Nelson
System Architect - MaineHealth IT
If there are actually 3 “SITES”, then there are 3 Monitor Daemons and 3 Lock Daemons. This means that you pick “Server Change” in the GUI menu to switch between the three sites.
If there truly are 3 “Cloverleaf Sites”, then you can rebuild the database for one without affecting the other 2.
If you are referring to “Sites” as thread pairs within the same Cloverleaf Engine (Same Net Monitor screen…) then you will need to turn off your inbound and insure that there are no messages “in flight” before doing a hcidbinit -AC. You may want to discuss this with support if you have ANY question about this. -AC replaces your database with a brand new copy with nuthin’ in it. It’s a drastic solution for a drastic problem.
Thank you for your patience.
The db issues are resolved.
My problem is that with the turning on and off of the lock manager – something has gone awry and I’m not able to get the lock manager to turn back on – in a single site (of the 3 previously described).
I probably should have started a new ticket, but this occurred while I was running the instructions provided – so I thought I’d ask.
Lawrence Nelson
System Architect - MaineHealth IT
The problem with start/stop lock manager you described would cause me to check if either of these 2 conditions exist
1) the lock manager pid file exists but the lock manager is no longer running (in this case probably need to remove the pid file)
ls -l $HCISITEDIR/exec/hcilockmgr/pid
ps -ef | grep `cat $HCISITEDIR/exec/hcilockmgr/pid`
2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)
ls -l $HCISITEDIR/exec/hcilockmgr/pid
ps -ef | grep $HCISITE | grep ” lm ”
since the GUI is stuck or confused might run these from command prompt to see if it gives any information as to a problem
# kill lock manager
hcisitectl -k l
# start lock manager
hcisitectl -s l
Russ Ross
RussRoss318@gmail.com
Thanks for the response –
I have 3 sites – each has a lock manager folder –
For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808
When I run this – from the same folder
hci@nordx-clotest hcilockmgr]$ ps -ef | grep $HCISITE | grep ” lm “
I get this returned –
hci 24829 1 0 Feb11 ? 00:00:02 lm -mp -u 500 -a lm_cis5.8_site_test_orders -z /opt/healthvision/cis5.8/integrator/site_test_orders/exec/databases/
Which looks to be in an entirely separate site folder.
The other commands returned this
[hci@nordx-clotest hcilockmgr]$ hcisitectl -k l
Warning: lock manager should not be killed prior to monitord
Lockmgr is running on pid 24829
hcimonitord is running on pid 22568
[hci@nordx-clotest hcilockmgr]$ hcisitectl -s l
Lockmgr is running on pid 24829
hcimonitord is running on pid 22568
Lawrence Nelson
System Architect - MaineHealth IT
Okay now that you see that the lock manager for the trouble site (sites_test_orders) is running as process 6808, do you have the corresponding pid file?
ls -l /opt/healthvision/cis5.8/integrator/site_test_orders/exec/hcilockmgr/pid
If not then this is situation 2) described in my earlier post.
2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)
If the pid file exists then the contents of the pid file needs to match the process ID that is running and your screen output shows a different number of 24829.
Hold on the grep also show a pid of 24829 so it is obvious now to me we aren’t effecitvely communicating becuase you think the pid is 6808.
For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808
Okay let me add this, you have to set the site envirnment to the site of interest and the cd to a directory is not the important part of getting $HCISITEDIR set.
If I’m the one confused please ignore and forgive me, if you want to call me off line I will be avaialbe after 10:30 AM CST.
If both the pid file and lock manager exist and match then no need to investigate this possibility.
Russ Ross
RussRoss318@gmail.com
Ok – My Lock Manager is fixed.
I am not the creator of these sites – to my best know a previous co-worker – split the order site in 2 – note sure what steps they took to do this.
The results site had no issues with it’s LM.
I took a copy of the PID file out of the Orders site and dropped it on the ‘NAP’ site and it immediately resolved the issue.
Again – I thank you for your time on this – much appreciated.
Lawrence Nelson
System Architect - MaineHealth IT
Well that’s a new one on me.
Having two identical pid files in different sites sounds like something to avoid but not knowing how the site was split leaves me baffeled, too.
I might even find myself wondering if I should blow it away and start from scratch to get to a known and trusted situation.
Russ Ross
RussRoss318@gmail.com
My standard procedure when it gets this bad is to consider the site to be ‘hosed’. I shut everything down, get rid of all pid and semaphore files, hcidbinit -AC and then bring everything back up. That winds up being a lot quicker than figuring out if I have the correct pid file or etc.