all test and production sites crashed

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf all test and production sites crashed

  • Creator
    Topic
  • #49077
    Kevin Scantlan
    Participant

      We had an unusual occurance yesterday.  All engine processes and the monitor daemon PANIC’ed yesterday afternoon.  We have 2 sites on each of our test and production machines.  All the engine processes on the test machine went down at 14:31 and all engine processes on the production machine went down at 14:46.    The test machine is on AIX 5.2 and Cloverleaf 5.4.1 and the production machine is on AIX 5.1 and Cloverleaf 5.2 .  Here’s an example of what showed on the process log:

      [pti :sign:WARN/0:test2_ps03_cmd:02/13/2007 14:31:58] Thread 0 received signal 11

      [pti :sign:WARN/0:test2_ps03_cmd:02/13/2007 14:31:58] PC = 0x103c5174

      PANIC: “0”

      PANIC: Calling “pti” for thread test2_ps03_cmd

      and at the end of the log:

      PANIC: Calling “dbi shutdown” for thread test2_ps03_cmd

      PANIC: Calling “dbi shutdown” for thread test2_ps03_xlate

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      PANIC: Calling “dbi shutdown” for thread algcer_oi_3

      PANIC: Calling “dbi shutdown” for thread algcer_oo_3

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      [dbi :dbi :WARN/0:  ceralg_oi_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      [dbi :dbi :WARN/0:  ceralg_oo_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      [dbi :dbi :WARN/0:  ceralg_oi_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      [dbi :dbi :WARN/0:  ceralg_oo_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      [dbi :dbi :WARN/0:  ceralg_oi_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      [dbi :dbi :WARN/0:  ceralg_oo_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      [dbi :dbi :WARN/0:  ceralg_oi_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      [dbi :dbi :WARN/0:  ceralg_oo_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oi_3

      [dbi :dbi :WARN/0:  ceralg_oi_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Calling “dbi shutdown” for thread ceralg_oo_3

      [dbi :dbi :WARN/0:  ceralg_oo_3:02/13/2007 14:31:58] NULL DTD when closing DBI

      PANIC: Process panic—engine going down

      PANIC: assertion ‘0’ failed at PthreadInterface.cpp/699

      We did bring our production machine back up after 15 minutes and we crossed our fingers.  It stayed up and has ever since.  

      The only thing that was going on was our system admin was attempting to load a new version of the OS (5.2) onto a split off mirror image of the OS disk.  But this should not have had any effect and definitely have no effect on our test machine.  I will attach the full log of the test process that I quoted from above.

    Viewing 10 reply threads
    • Author
      Replies
      • #60665
        Jim Kosloskey
        Participant

          Kevin,

          Did you have any system resources (Disk, memeory, etc.) that got saturated during that time?

          Do you have the Cloverleaf alerts activated that sense exhaustion of critical system resources?

          Just a thought…

          Jim Kosloskey

          email: jim.kosloskey@jim-kosloskey.com 29+ years Cloverleaf, 59 years IT - old fart.

        • #60666
          Kevin Scantlan
          Participant

            Yes, we have alerts in place.  The lock manager and the host server did not go down.  First thing I had did was to check with our system admins, but they did see an indication of problems in their logs.

          • #60667
            Kevin Scantlan
            Participant

              ….. did NOT see an indication in their logs….

            • #60668
              Gene Salay
              Participant

                Sometimes the grapevine is useful – our test box went down today because somebody unplugged something in the datacenter.     The posts about cleanup were very helpful…     Did you solve the mystery on your end?

              • #60669
                Kevin Scantlan
                Participant

                  No.  It’s still a mystery.  In our case, the machines did not go down.  Just the engine processes and the monitor daemon went down like a rock.  The lock manager and host server stayed up.  Our 2 machines are physically located in different buildings.  We checked to see if there had been an intruder, but so no abnormal logins.

                  We cleaned up our sites like support recommended.  One site on each box had a large recovery database, but I doubt if that was the culprit.  The other sites had normal sized databases.

                • #60670
                  John Stafford
                  Participant

                    I am experiencing the same problem, today. Had the monitor daemon crash, and all of the threads went dead. In the logs, I saw a similar error:

                    [icl :tcpi:ERR /0:    cdi_xlate:04/16/2013 10:52:42] Read on socket 32 failed: Connection reset by peer

                    [cmd :cmd :INFO/0:      cdi_cmd:04/16/2013 10:52:47] Receiving a command

                    PANIC: “str != ((char *) 0)”

                    PANIC: Calling “pti” for thread cdi_cmd


                    Scheduler State


                    Thread Events     State      Priority Runnable  PT Msgs

                      0      0   SCHED_RUNNING      0       0       0,0,0

                      1      0   SCHED_IDLE         0       0       0,0,0

                      2      0   SCHED_IDLE         0       0       0,0,0


                    Thread 0


                    ti: 0x20a9f1e8

                       tid           :    0

                       HostPthreadId : 0x1

                       EventList     : 0x20a9f038

                       PolledEvents  : 0x20a9f328

                       PthreadEvent  : 0x20a9f488

                       ReadyEvents   : 0x20a9f348

                       CtrlMsgs      : 0x20a9f368

                       UserCtrlMsgs  : 0x20a9f388

                       UserDataMsgs  : 0x20a9f3a8

                       StartArgs     : 0x0

                       SchedState    : SCHED_RUNNING

                       SchedPriority : 0

                       Killed        : 0


                    Registered Events


                    el: 0x20a9f038

                       elCount : 3

                       elHead: 0x20a9f4c8

                       elTail: 0x2147b438

                    ele: 0x20a9f4c8

                       event: 0x20a9f488

                       prev : 0x0

                       next : 0x20c2f4b8

                    ev: 0x20a9f488

                        evType     : PTHREADS

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 0

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x0

                        evCallBack   : 0x0

                        evCbShutdown : 0x0

                        evRecurFreq  :

                    ele: 0x20c2f4b8

                       event: 0x20bc3ce8

                       prev : 0x20a9f4c8

                       next : 0x2147b438

                    ev: 0x20bc3ce8

                        evType     : SOCKET

                        evStrDesc    : Command port listen

                        evSocket     : 7

                        evMsgQue     : 0

                        evTid        : 0

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x20bc3cc8

                        evCallBack   : 0x200b2524

                        evCbShutdown : 0x200b2530

                        evRecurFreq  :

                    ele: 0x2147b438

                       event: 0x20c8dc88

                       prev : 0x20c2f4b8

                       next : 0x0

                    ev: 0x20c8dc88

                        evType     : SOCKET

                        evStrDesc    : Command connection accept

                        evSocket     : 35

                        evMsgQue     : 0

                        evTid        : 0

                        evState      : 2

                        evPtMsg      : 0x0

                        evUserData   : 0x20c898b8

                        evCallBack   : 0x200b253c

                        evCbShutdown : 0x200b2530

                        evRecurFreq  :


                    Polled Events


                    el: 0x20a9f328

                       elCount : 0

                       elHead: 0x0

                       elTail: 0x0


                    Ready Events


                    el: 0x20a9f348

                       elCount : 0

                       elHead: 0x0

                       elTail: 0x0


                    Outstanding Pthread Ctrl Msgs


                    pmq: 0x20a9f368

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Ctrl Msgs


                    pmq: 0x20a9f388

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Data Msgs


                    pmq: 0x20a9f3a8

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Thread 1


                    ti: 0x20c8a668

                       tid           :    1

                       HostPthreadId : 0x102

                       EventList     : 0x20c2f608

                       PolledEvents  : 0x20c2f628

                       PthreadEvent  : 0x20dbd5a8

                       ReadyEvents   : 0x20c2f648

                       CtrlMsgs      : 0x20c2f668

                       UserCtrlMsgs  : 0x20c2f688

                       UserDataMsgs  : 0x20c2f6a8

                       StartArgs     : 0x20c2f5c8

                       SchedState    : SCHED_IDLE

                       SchedPriority : 0

                       Killed        : 0


                    Registered Events


                    el: 0x20c2f608

                       elCount : 3

                       elHead: 0x20dbd208

                       elTail: 0x20dbd678

                    ele: 0x20dbd208

                       event: 0x20dbd568

                       prev : 0x0

                       next : 0x20dbd508

                    ev: 0x20dbd568

                        evType     : SOCKET

                        evStrDesc    :

                        evSocket     : 13

                        evMsgQue     : 0

                        evTid        : 1

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x20dbd048

                        evCallBack   : 0x200b220c

                        evCbShutdown : 0x200b031c

                        evRecurFreq  :

                    ele: 0x20dbd508

                       event: 0x20dbd5a8

                       prev : 0x20dbd208

                       next : 0x20dbd678

                    ev: 0x20dbd5a8

                        evType     : PTHREADS

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 1

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x20dbd138

                        evCallBack   : 0x200b0328

                        evCbShutdown : 0x200b031c

                        evRecurFreq  :

                    ele: 0x20dbd678

                       event: 0x20dbd618

                       prev : 0x20dbd508

                       next : 0x0

                    ev: 0x20dbd618

                        evType     : POLLED

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 1

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x0

                        evCallBack   : 0x200b2224

                        evCbShutdown : 0x0

                        evRecurFreq  :


                    Polled Events


                    el: 0x20c2f628

                       elCount : 1

                       elHead: 0x20dbd658

                       elTail: 0x20dbd658

                    ele: 0x20dbd658 — POLLED event: 0x20dbd618


                    Ready Events


                    el: 0x20c2f648

                       elCount : 0

                       elHead: 0x0

                       elTail: 0x0


                    Outstanding Pthread Ctrl Msgs


                    pmq: 0x20c2f668

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Ctrl Msgs


                    pmq: 0x20c2f688

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Data Msgs


                    pmq: 0x20c2f6a8

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Thread 2


                    ti: 0x211f51a8

                       tid           :    2

                       HostPthreadId : 0x203

                       EventList     : 0x20c8b188

                       PolledEvents  : 0x20c8b348

                       PthreadEvent  : 0x21478bb8

                       ReadyEvents   : 0x20c8b368

                       CtrlMsgs      : 0x20cf6cd8

                       UserCtrlMsgs  : 0x20cf6d48

                       UserDataMsgs  : 0x20da04c8

                       StartArgs     : 0x20c8b168

                       SchedState    : SCHED_IDLE

                       SchedPriority : 0

                       Killed        : 0


                    Registered Events


                    el: 0x20c8b188

                       elCount : 5

                       elHead: 0x21478bf8

                       elTail: 0x21474218

                    ele: 0x21478bf8

                       event: 0x21478b58

                       prev : 0x0

                       next : 0x21478c38

                    ev: 0x21478b58

                        evType     : SOCKET

                        evStrDesc    :

                        evSocket     : 20

                        evMsgQue     : 0

                        evTid        : 2

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x21478b38

                        evCallBack   : 0x200b220c

                        evCbShutdown : 0x200b031c

                        evRecurFreq  :

                    ele: 0x21478c38

                       event: 0x21478bb8

                       prev : 0x21478bf8

                       next : 0x21474098

                    ev: 0x21478bb8

                        evType     : PTHREADS

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 2

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x21478b98

                        evCallBack   : 0x200b0328

                        evCbShutdown : 0x200b031c

                        evRecurFreq  :

                    ele: 0x21474098

                       event: 0x21466818

                       prev : 0x21478c38

                       next : 0x21493f08

                    ev: 0x21466818

                        evType     : POLLED

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 2

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x0

                        evCallBack   : 0x200b2368

                        evCbShutdown : 0x0

                        evRecurFreq  :

                    ele: 0x21493f08

                       event: 0x214741b8

                       prev : 0x21474098

                       next : 0x21474218

                    ev: 0x214741b8

                        evType     : ACTIVE_TIMER

                        evStrDesc    :

                        evSocket     : 0

                        evMsgQue     : 0

                        evTid        : 2

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x214378b8

                        evCallBack   : 0x200b226c

                        evCbShutdown : 0x0

                        evRecurFreq  : 5.0000

                    ele: 0x21474218

                       event: 0x20c89558

                       prev : 0x21493f08

                       next : 0x0

                    ev: 0x20c89558

                        evType     : SOCKET

                        evStrDesc    :

                        evSocket     : 32

                        evMsgQue     : 0

                        evTid        : 2

                        evState      : 0

                        evPtMsg      : 0x0

                        evUserData   : 0x20c89488

                        evCallBack   : 0x200b0328

                        evCbShutdown : 0x200b031c

                        evRecurFreq  :


                    Polled Events


                    el: 0x20c8b348

                       elCount : 1

                       elHead: 0x21466858

                       elTail: 0x21466858

                    ele: 0x21466858 — POLLED event: 0x21466818


                    Ready Events


                    el: 0x20c8b368

                       elCount : 0

                       elHead: 0x0

                       elTail: 0x0


                    Outstanding Pthread Ctrl Msgs


                    pmq: 0x20cf6cd8

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Ctrl Msgs


                    pmq: 0x20cf6d48

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0


                    Outstanding Pthread User Data Msgs


                    pmq: 0x20da04c8

                    Count   : 0

                    Head    : 0x0

                    Tail    : 0x0

                    PANIC: Calling “dbi shutdown” for thread cdi_cmd

                    PANIC: Calling “dbi shutdown” for thread cdi_xlate

                    PANIC: Calling “dbi shutdown” for thread t_cdi_a

                    PANIC: Thread panic—engine going down

                    PANIC: assertion ‘str != ((char *) 0)’ failed at str.cpp/36

                    Should I contact Infor support about this? I’m new in this position, and have never had to get in touch with them about this stuff.

                  • #60671
                    John Mercogliano
                    Participant

                      Check with your security folks.  We had this happen to us when they decided to do an unannounce security audit of our box when we where on 5.2.  It appears 5.2 is vulnerable to certain types of security port scans like the ones they used and it brought us down hard.  5.7 handles it alot better.

                      John Mercogliano
                      Sentara Healthcare
                      Hampton Roads, VA

                    • #60672
                      Russ Ross
                      Participant

                        We too have had all our cloverleaf interfaces and process crashed as the result of a security scan.  

                        I was going to make the same suggestion to check if your security group might of scanned your boxes.

                        Russ Ross
                        RussRoss318@gmail.com

                      • #60673
                        John Stafford
                        Participant

                          As it turns out, we were having a security audit performed that included an IP sweep and port scans. The engine panicked twice and crashed while the security analyst was here.

                          Thanks so much for the information, guys. It helped to confirm some of our suspicions!

                        • #60674
                          Bob Richardson
                          Participant

                            Greetings,

                            I would recommend that you upgrade your Integrator as there is a fix

                            for this problem starting with 5.7 Revision 3.  The AIX would need to be upgraded also.

                            It is not advisable to get so far behind in this software as support may no longer be available from INFOR.

                            Have a great evening!

                          • #60675
                            John Stafford
                            Participant

                              This was the straw that broke the camel’s back, so to speak.

                              We are actually running 5.5 Rev 1, which is slightly less out of date. I have engaged Infor, and I am speaking with our technical team to coordinate an AIX upgrade, as well.

                              Thanks for all of your help!

                          Viewing 10 reply threads
                          • The forum ‘Cloverleaf’ is closed to new topics and replies.