Database Errors – need steps to purge Error db externally

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Database Errors – need steps to purge Error db externally

  • Creator
    Topic
  • #54040
    Lawrence Nelson
    Participant

      I was processing a massive VRL to VRL file via an XLATE. (file has 350,000 rows) on 5.8

      The process hung

      Lawrence Nelson
      System Architect - MaineHealth IT

    Viewing 19 reply threads
    • Author
      Replies
      • #79943
        Lawrence Nelson
        Participant

          This is the current state –

          Command Issued: hcidbdump -r

          Command status:0

          Command output:

          Unable to initialize DBI, err = 2

                                 C  

                                 l T

                                 a y F

                                 s p w

          Created  Message Id     s e d Prio State Length Source          Dest



          – – – —-





          [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

          [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

          [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921

          DBUSERID is already being used

          C errno = 0: Success”

          Lawrence Nelson
          System Architect - MaineHealth IT

        • #79944
          David Coffey
          Participant

            hcidbdump is a powerful command, be warned!  You could try something like this:

            hcidbdump -e -L -c -D -F > ./log_file.txt

            ‘E’rror database, ‘L’ong format with ‘c’ontext, ‘D’elete after processing and ‘F’orce the delete action

            The output is piped to a file

          • #79945
            Henry Bauer
            Participant

              if you are trying to fix the error database use

              keybuild elog

              dchain elog

              Sorry this was my mistake in not defining the commands for recovery and error databases.

            • #79946
              Lawrence Nelson
              Participant

                Thank you all!

                Lawrence Nelson
                System Architect - MaineHealth IT

              • #79947
                Lawrence Nelson
                Participant

                  Still getting this on the Error DB side –

                  Command Issued: hcidbdump -e

                  Command status:0

                  Command output:

                  Unable to initialize DBI, err = 2

                                         C  

                                         l T

                                         a y F

                                         s p w

                  Created  Message Id     s e d Prio State Length Source          Dest



                  – – – —-





                  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

                  [0:_hcidbdump_] (-921) ‘DB_VISTA user id check failed: ‘_hcidbdump_’

                  [0:_hcidbdump_] (-921) ‘RDM Embedded DB error: “SYSTEM/OS error: -921

                  DBUSERID is already being used

                  C errno = 0: Success”

                  Lawrence Nelson
                  System Architect - MaineHealth IT

                • #79948
                  Henry Bauer
                  Participant

                    Found this information hope it helps:

                    UNIX

                    lmclear -u TEST -mp

                    On Unix it is easier to recover from this command.  No need to stop or shut down anything.

                    WINDOWS

                    This error is less forgiving on Windows systems.  To clear you must shut down all sites and stop the QDX 5.x service in the control panel.  Once the service is down you should perform a site cleanup.

                  • #79949
                    Lawrence Nelson
                    Participant

                      We log on with a single user ID.

                      When I run this command I get this error

                      [hci@nordx-clotest ~]$ lmclear -u XXXX -mp

                      Lock Manager Clear Utility

                      RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com

                      Copyright (c) 1992-2008 Birdstep Technology, Inc.  All Rights Reserved.

                      User XXXX does not exist in lock manager.

                      How do I determine the user name that the command is looking for in the lock manager.

                      Also – a fine point we’re on Linux (not specifically unix/aix)

                      Lawrence Nelson
                      System Architect - MaineHealth IT

                    • #79950
                      Henry Bauer
                      Participant

                        the user is just TEST

                      • #79951
                        Lawrence Nelson
                        Participant

                          [hci@nordx-clotest databases]$ lmclear -u TEST -mp

                          Lock Manager Clear Utility

                          RDM Embedded 8.1 [15-Oct-2008] http://www.raima.com

                          Copyright (c) 1992-2008 Birdstep Technology, Inc.  All Rights Reserved.

                          User TEST does not exist in lock manager.

                          [hci@nordx-clotest databases]$

                          Lawrence Nelson
                          System Architect - MaineHealth IT

                        • #79952
                          Lawrence Nelson
                          Participant

                            In the end I’ve cheated –

                            I went to another – old empty site – and stole these files to replace the error db content –

                            elog.dbd

                            elogCtx.key

                            elogCtx.dat

                            elogM2k.dat

                            elogMid.key

                            elogMid.dat

                            I gone this route to eliminate my frustration – where some other step has left the GUI lock manager in a Red state and I am unable to start to from the GUI or command line.

                            Lawrence Nelson
                            System Architect - MaineHealth IT

                          • #79953
                            Terry Kellum
                            Participant

                              If you don’t need any of the messages in the SITE, you can use an  

                              hcidbinit -AC to destroy the database and rebuild it from scratch.

                              It warns you that it’s dangerous, but in essence it clears out all messages that are in-flight and in the error db.  You must make sure that this is really what you want to do.

                            • #79954
                              Lawrence Nelson
                              Participant

                                Thanks for all your help. The problems on the DB have been solved – but I’m un able to resolve this issue –

                                GUI lock manager in a Red state

                                and I am unable to start to from the GUI or command line.

                                The extra confusion on this for me is that I have 3 sites –

                                site_test_nap

                                site_test_orders

                                site_test_results

                                The lock managers on orders and results are fine –

                                the site_test_nap is the issue.

                                The problem I seem to be having is how to run the lmclear to process specifically on the one site – while ignoring the other 2.

                                [/img]

                                Lawrence Nelson
                                System Architect - MaineHealth IT

                              • #79955
                                Terry Kellum
                                Participant

                                  If there are actually 3 “SITES”, then there are 3 Monitor Daemons and 3 Lock Daemons.  This means that you pick “Server Change” in the GUI menu to switch between the three sites.

                                  If there truly are 3 “Cloverleaf Sites”, then you can rebuild the database for one without affecting the other 2.

                                  If you are referring to “Sites” as thread pairs within the same Cloverleaf Engine (Same Net Monitor screen…) then you will need to turn off your inbound and insure that there are no messages “in flight” before doing a hcidbinit -AC.  You may want to discuss this with support if you have ANY question about this.  -AC replaces your database with a brand new copy with nuthin’ in it.  It’s a drastic solution for a drastic problem.

                                • #79956
                                  Lawrence Nelson
                                  Participant

                                    Thank you for your patience.

                                    The db issues are resolved.

                                    My problem is that with the turning on and off of the lock manager – something has gone awry and I’m not able to get the lock manager to turn back on – in a single site (of the 3 previously described).

                                    I probably should have started a new ticket, but this occurred while I was running the instructions provided – so I thought I’d ask.

                                    Lawrence Nelson
                                    System Architect - MaineHealth IT

                                  • #79957
                                    Russ Ross
                                    Participant

                                      The problem with start/stop lock manager you described would cause me to check if either of these 2 conditions exist

                                      1) the lock manager pid file exists but the lock manager is no longer running (in this case probably need to remove the pid file)

                                      Code:


                                      ls -l $HCISITEDIR/exec/hcilockmgr/pid
                                      ps -ef | grep `cat $HCISITEDIR/exec/hcilockmgr/pid`

                                      2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)

                                      Code:


                                      ls -l $HCISITEDIR/exec/hcilockmgr/pid
                                      ps -ef | grep $HCISITE | grep ” lm ”

                                      since the GUI is stuck or confused might run these from command prompt to see if it gives any information as to a problem

                                      Code:


                                      # kill lock manager
                                      hcisitectl -k l

                                      # start lock manager
                                      hcisitectl -s l

                                      Russ Ross
                                      RussRoss318@gmail.com

                                    • #79958
                                      Lawrence Nelson
                                      Participant

                                        Thanks for the response –

                                        I have 3 sites – each has a lock manager folder –

                                        For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808

                                        When I run this – from the same folder

                                        hci@nordx-clotest hcilockmgr]$ ps -ef | grep $HCISITE | grep ” lm “

                                        I get this returned  –

                                        hci      24829     1  0 Feb11 ?        00:00:02 lm -mp -u 500 -a lm_cis5.8_site_test_orders -z /opt/healthvision/cis5.8/integrator/site_test_orders/exec/databases/

                                        Which looks to be in an entirely separate site folder.

                                        The other commands returned this

                                        [hci@nordx-clotest hcilockmgr]$ hcisitectl -k l

                                        Warning: lock manager should not be killed prior to monitord

                                        Lockmgr    is running on pid 24829

                                        hcimonitord is running on pid 22568

                                        [hci@nordx-clotest hcilockmgr]$ hcisitectl -s l

                                        Lockmgr    is running on pid 24829

                                        hcimonitord is running on pid 22568

                                        Lawrence Nelson
                                        System Architect - MaineHealth IT

                                      • #79959
                                        Russ Ross
                                        Participant

                                          Okay now that you see that the lock manager for the trouble site (sites_test_orders) is running as process 6808, do you have the corresponding pid file?

                                          Code:

                                          ls -l /opt/healthvision/cis5.8/integrator/site_test_orders/exec/hcilockmgr/pid

                                          If not then this is situation 2) described in my earlier post.

                                          Code:

                                          2) the lock manager pid file does not exists but the lock manager has gone rogue and is still running (in this case probably have to use kill -9 on the rogue process and if really screwy might even have more than one rogue instance running)

                                          If the pid file exists then the contents of the pid file needs to match the process ID that is running and your screen output shows a different number of 24829.

                                          Hold on the grep also show a pid of 24829 so it is obvious now to me we aren’t effecitvely communicating becuase you think the pid is 6808.

                                          Code:

                                          For the site with the issue – from within that site’s lock manager folder – the grep returns a PID of 6808

                                          Okay let me add this, you have to set the site envirnment to the site of interest and the cd to a directory is not the important part of getting $HCISITEDIR set.

                                          If I’m the one confused please ignore and forgive me, if you want to call me off line I will be avaialbe after 10:30 AM CST.

                                          If both the pid file and lock manager exist and match then no need to investigate this possibility.

                                          Russ Ross
                                          RussRoss318@gmail.com

                                        • #79960
                                          Lawrence Nelson
                                          Participant

                                            Ok – My Lock Manager is fixed.

                                            I am not the creator of these sites – to my best know a previous co-worker – split the order site in 2 – note sure what steps they took to do this.

                                            The results site had no issues with it’s LM.

                                            I took a copy of the PID file  out of the Orders site and dropped it on the ‘NAP’ site and it immediately resolved the issue.

                                            Again – I thank you for your time on this – much appreciated.

                                            Lawrence Nelson
                                            System Architect - MaineHealth IT

                                          • #79961
                                            Russ Ross
                                            Participant

                                              Well that’s a new one on me.

                                              Having two identical pid files in different sites sounds like something to avoid but not knowing how the site was split leaves me baffeled, too.

                                              I might even find myself wondering if I should blow it away and start from scratch to get to a known and trusted situation.

                                              Russ Ross
                                              RussRoss318@gmail.com

                                            • #79962
                                              Terry Kellum
                                              Participant

                                                My standard procedure when it gets this bad is to consider the site to be ‘hosed’.  I shut everything down, get rid of all pid and semaphore files, hcidbinit -AC and then bring everything back up.  That winds up being a lot quicker than figuring out if I have the correct pid file or etc.

                                            Viewing 19 reply threads
                                            • The forum ‘Cloverleaf’ is closed to new topics and replies.