Database Errors – need steps to purge Error db externally

This topic has 20 replies, 5 voices, and was last updated 11 years, 6 months ago by Terry Kellum.

Creator

Topic
February 10, 2014 at 7:14 pm #54040
Lawrence Nelson
Participant
I was processing a massive VRL to VRL file via an XLATE. (file has 350,000 rows) on 5.8

The process hung

Lawrence Nelson
System Architect - MaineHealth IT
Creator

Topic

Viewing 19 reply threads

Author

Replies
- February 10, 2014 at 7:36 pm #79943
  Lawrence Nelson
  Participant
  This is the current state –
  
  Command Issued: hcidbdump -r
  
  Command status:0
  
  Command output:
  
  Unable to initialize DBI, err = 2
  
  C
  
  l T
  
  a y F
  
  s p w
  
  Created Message Id s e d Prio State Length Source Dest
  
  – – – —-
  
  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
  
  ‘
  
  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
  
  ‘
  
  [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921
  
  DBUSERID is already being used
  
  C errno = 0: Success”
  
  ‘
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 10, 2014 at 7:38 pm #79944
  David Coffey
  Participant
  hcidbdump is a powerful command, be warned! You could try something like this:
  
  hcidbdump -e -L -c -D -F > ./log_file.txt
  
  ‘E’rror database, ‘L’ong format with ‘c’ontext, ‘D’elete after processing and ‘F’orce the delete action
  
  The output is piped to a file
- February 10, 2014 at 7:45 pm #79945
  Henry Bauer
  Participant
  if you are trying to fix the error database use
  
  keybuild elog
  
  dchain elog
  
  Sorry this was my mistake in not defining the commands for recovery and error databases.
- February 10, 2014 at 9:39 pm #79946
  Lawrence Nelson
  Participant
  Thank you all!
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 10, 2014 at 10:04 pm #79947
  Lawrence Nelson
  Participant
  Still getting this on the Error DB side –
  
  Command Issued: hcidbdump -e
  
  Command status:0
  
  Command output:
  
  Unable to initialize DBI, err = 2
  
  C
  
  l T
  
  a y F
  
  s p w
  
  Created Message Id s e d Prio State Length Source Dest
  
  – – – —-
  
  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
  
  ‘
  
  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’
  
  ‘
  
  [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921
  
  DBUSERID is already being used
  
  C errno = 0: Success”
  
  ‘
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 11, 2014 at 12:41 pm #79948
  Henry Bauer
  Participant
  Found this information hope it helps:
  
  UNIX
  
  lmclear -u TEST -mp
  
  On Unix it is easier to recover from this command. No need to stop or shut down anything.
  
  WINDOWS
  
  This error is less forgiving on Windows systems. To clear you must shut down all sites and stop the QDX 5.x service in the control panel. Once the service is down you should perform a site cleanup.
- February 11, 2014 at 2:06 pm #79949
  Lawrence Nelson
  Participant
  We log on with a single user ID.
  
  When I run this command I get this error
  
  [hci@nordx-clotest ~]$ lmclear -u XXXX -mp
  
  Lock Manager Clear Utility
  
  RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com
  
  Copyright (c) 1992-2008 Birdstep Technology, Inc. All Rights Reserved.
  
  User XXXX does not exist in lock manager.
  
  How do I determine the user name that the command is looking for in the lock manager.
  
  Also – a fine point we’re on Linux (not specifically unix/aix)
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 11, 2014 at 2:08 pm #79950
  Henry Bauer
  Participant
  the user is just TEST
- February 11, 2014 at 2:29 pm #79951
  Lawrence Nelson
  Participant
  [hci@nordx-clotest databases]$ lmclear -u TEST -mp
  
  Lock Manager Clear Utility
  
  RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com
  
  Copyright (c) 1992-2008 Birdstep Technology, Inc. All Rights Reserved.
  
  User TEST does not exist in lock manager.
  
  [hci@nordx-clotest databases]$
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 11, 2014 at 3:12 pm #79952
  Lawrence Nelson
  Participant
  In the end I’ve cheated –
  
  I went to another – old empty site – and stole these files to replace the error db content –
  
  elog.dbd
  
  elogCtx.key
  
  elogCtx.dat
  
  elogM2k.dat
  
  elogMid.key
  
  elogMid.dat
  
  I gone this route to eliminate my frustration – where some other step has left the GUI lock manager in a Red state and I am unable to start to from the GUI or command line.
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 11, 2014 at 6:35 pm #79953
  Terry Kellum
  Participant
  If you don’t need any of the messages in the SITE, you can use an
  
  hcidbinit -AC to destroy the database and rebuild it from scratch.
  
  It warns you that it’s dangerous, but in essence it clears out all messages that are in-flight and in the error db. You must make sure that this is really what you want to do.
- February 11, 2014 at 7:43 pm #79954
  Lawrence Nelson
  Participant
  Thanks for all your help. The problems on the DB have been solved – but I’m un able to resolve this issue –
  
  GUI lock manager in a Red state
  
  and I am unable to start to from the GUI or command line.
  
  The extra confusion on this for me is that I have 3 sites –
  
  site_test_nap
  
  site_test_orders
  
  site_test_results
  
  The lock managers on orders and results are fine –
  
  the site_test_nap is the issue.
  
  The problem I seem to be having is how to run the lmclear to process specifically on the one site – while ignoring the other 2.
  
  [/img]
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 11, 2014 at 7:55 pm #79955
  Terry Kellum
  Participant
  If there are actually 3 “SITES”, then there are 3 Monitor Daemons and 3 Lock Daemons. This means that you pick “Server Change” in the GUI menu to switch between the three sites.
  
  If there truly are 3 “Cloverleaf Sites”, then you can rebuild the database for one without affecting the other 2.
  
  If you are referring to “Sites” as thread pairs within the same Cloverleaf Engine (Same Net Monitor screen…) then you will need to turn off your inbound and insure that there are no messages “in flight” before doing a hcidbinit -AC. You may want to discuss this with support if you have ANY question about this. -AC replaces your database with a brand new copy with nuthin’ in it. It’s a drastic solution for a drastic problem.
- February 11, 2014 at 10:03 pm #79956
  Lawrence Nelson
  Participant
  Thank you for your patience.
  
  The db issues are resolved.
  
  My problem is that with the turning on and off of the lock manager – something has gone awry and I’m not able to get the lock manager to turn back on – in a single site (of the 3 previously described).
  
  I probably should have started a new ticket, but this occurred while I was running the instructions provided – so I thought I’d ask.
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 12, 2014 at 10:51 pm #79957
  Russ Ross
  Participant
  The problem with start/stop lock manager you described would cause me to check if either of these 2 conditions exist
  
  1) the lock manager pid file exists but the lock manager is no longer running (in this case probably need to remove the pid file)
  
  Code: ls -l $HCISITEDIR/exec/hcilockmgr/pid ps -ef | grep `cat $HCISITEDIR/exec/hcilockmgr/pid`
  
  2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)
  
  Code: ls -l $HCISITEDIR/exec/hcilockmgr/pid ps -ef | grep $HCISITE | grep ” lm ”
  
  since the GUI is stuck or confused might run these from command prompt to see if it gives any information as to a problem
  
  Code: # kill lock manager hcisitectl -k l # start lock manager hcisitectl -s l
  
  Russ Ross
  RussRoss318@gmail.com
- February 13, 2014 at 1:51 pm #79958
  Lawrence Nelson
  Participant
  Thanks for the response –
  
  I have 3 sites – each has a lock manager folder –
  
  For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808
  
  When I run this – from the same folder
  
  hci@nordx-clotest hcilockmgr]$ ps -ef | grep $HCISITE | grep ” lm “
  
  I get this returned –
  
  hci 24829 1 0 Feb11 ? 00:00:02 lm -mp -u 500 -a lm_cis5.8_site_test_orders -z /opt/healthvision/cis5.8/integrator/site_test_orders/exec/databases/
  
  Which looks to be in an entirely separate site folder.
  
  The other commands returned this
  
  [hci@nordx-clotest hcilockmgr]$ hcisitectl -k l
  
  Warning: lock manager should not be killed prior to monitord
  
  Lockmgr is running on pid 24829
  
  hcimonitord is running on pid 22568
  
  [hci@nordx-clotest hcilockmgr]$ hcisitectl -s l
  
  Lockmgr is running on pid 24829
  
  hcimonitord is running on pid 22568
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 13, 2014 at 2:53 pm #79959
  Russ Ross
  Participant
  Okay now that you see that the lock manager for the trouble site (sites_test_orders) is running as process 6808, do you have the corresponding pid file?
  
  Code: ls -l /opt/healthvision/cis5.8/integrator/site_test_orders/exec/hcilockmgr/pid
  
  If not then this is situation 2) described in my earlier post.
  
  Code: 2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)
  
  If the pid file exists then the contents of the pid file needs to match the process ID that is running and your screen output shows a different number of 24829.
  
  Hold on the grep also show a pid of 24829 so it is obvious now to me we aren’t effecitvely communicating becuase you think the pid is 6808.
  
  Code: For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808
  
  Okay let me add this, you have to set the site envirnment to the site of interest and the cd to a directory is not the important part of getting $HCISITEDIR set.
  
  If I’m the one confused please ignore and forgive me, if you want to call me off line I will be avaialbe after 10:30 AM CST.
  
  If both the pid file and lock manager exist and match then no need to investigate this possibility.
  
  Russ Ross
  RussRoss318@gmail.com
- February 13, 2014 at 3:20 pm #79960
  Lawrence Nelson
  Participant
  Ok – My Lock Manager is fixed.
  
  I am not the creator of these sites – to my best know a previous co-worker – split the order site in 2 – note sure what steps they took to do this.
  
  The results site had no issues with it’s LM.
  
  I took a copy of the PID file out of the Orders site and dropped it on the ‘NAP’ site and it immediately resolved the issue.
  
  Again – I thank you for your time on this – much appreciated.
  
  Lawrence Nelson
  System Architect - MaineHealth IT
- February 13, 2014 at 5:32 pm #79961
  Russ Ross
  Participant
  Well that’s a new one on me.
  
  Having two identical pid files in different sites sounds like something to avoid but not knowing how the site was split leaves me baffeled, too.
  
  I might even find myself wondering if I should blow it away and start from scratch to get to a known and trusted situation.
  
  Russ Ross
  RussRoss318@gmail.com
- February 13, 2014 at 5:50 pm #79962
  Terry Kellum
  Participant
  My standard procedure when it gets this bad is to consider the site to be ‘hosed’. I shut everything down, get rid of all pid and semaphore files, hcidbinit -AC and then bring everything back up. That winds up being a lot quicker than figuring out if I have the correct pid file or etc.
Author

Replies

Viewing 19 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.