Protocol UPOC is Crashing Our Engine

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Protocol UPOC is Crashing Our Engine

  • Creator
    Topic
  • #52801
    Carter Harrison
    Participant

      We are in the process of upgrading from Cloverleaf 5.6 to version 5.8.4P  We are running on AIX 5.3.

      I have an outbound thread configured to use a custom protocol UPOC (which worked fine under version 5.6).  When the process starts, the thread comes online and all is fine.  If the thread is stopped, everything is fine.  If I then try to start the thread, the entire process will crash.

      I have the thread configured to use the same UPOC in both read and write mode.  The read mode is using advanced scheduling and it is in the read mode context that the engine appears to crash.

      I have pared down the TCL file so that it is completely bare, yet the crash still occurs.  Here is the UPOC currently:

      Code:


      proc fileUPOC { args } {
         global HciConnName

         keylget args MODE mode
         set ctx “”; keylget args CONTEXT ctx
         set uargs {}; keylget args ARGS uargs
         set module “fileUPOC/$HciConnName/$ctx”
         echo “$module: Running In Mode $mode”
      }

      When the process starts up, I see the following output from the UPOC in the process log.

      Quote:


      fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

      fileUPOC/FILE_OUT/pdupoc_read: Running In Mode start

      However after I stop the thread and attempt to restart it, this is what I see in the process log immediately before the crash.

      Quote:


      fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

      [icl :tcpi:ERR /0: upoctest_cmd:11/11/2011 10:21:11] write failed: Bad file number

      [cmd :cmd :INFO/0: upoctest_cmd:11/11/2011 10:21:11] Inrecoverable socket error.  Closing connection.

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Shutdown failed: Bad file number

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Close failed: Bad file number

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] imhVerify failed at free 0x30fef230

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Bad attempt to free 0x30fef230

      [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

      PANIC: “0”

      PANIC: Calling “pti” for thread upoctest_cmd

      The “write” context is clearly executing correctly, but the “read” context never gets to my echo statement before the crash.  If I reconfigure the thread so that it does not use a UPOC for reads, then the crash does not occur.  

      Has anybody else seen anything like this or is anybody else using a UPOC for both read and write in a Cloverleaf 5.8 thread?

    Viewing 4 reply threads
    • Author
      Replies
      • #75518
        Levy Lazarre
        Participant

          Carter,

          I don’t think the “read” context is seeing your “echo” statement at all. I believe that you have pared down your tcl proc too much.

          You need at least at least an empty “time” section in your proc because this is the code the advanced scheduler is going to use for the Read.

          Code:



          time {
             # The proc will enter here if it is configured as a Read TPS
               # and an interval or advanced scheduling configured

             # The disposition list
             set dispList {}

               echo “$module: Running In Mode $mode”

               return $dispList
          }

          Similarily, the Write context should be running in Mode “run”.

        • #75519
          Rob Abbott
          Keymaster

            Carter, if you haven’t already, would you please open a support case on this?

            Rob Abbott
            Cloverleaf Emeritus

          • #75520
            Carter Harrison
            Participant

              Thanks Rob.  That was my next step.  The person who replied earlier was trying to be helpful, but in all honesty there is no way that a TCL proc should be allowed to crash the engine.

            • #75521
              Luke Anderson
              Participant

                Hi all, I’m seeing something very similar with a UPOC thread I am trying to setup using the Advanced Scheduling routine.  I can start the thread fine the first time but if I turn the thread off and then back on, it kills the process and throws the error below:

                [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Starting protocol thread d_adt_checkfile as tid 2.

                [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Applying EO config: ”

                [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] Thread 2 ( d_adt_checkfile ) received signal EXCEPTION_ACCESS_VIOLATION:

                 The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.

                [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] PC = 0xffffffff

                PANIC: “0”

                PANIC: Calling “pti” for thread d_adt_cmd


                Scheduler State


                Thread Events     State      Priority Runnable  PT Msgs

                  0      0   SCHED_IDLE         0       0       0,0,0

                  1      0   SCHED_IDLE         0       0       0,0,0

                  2      0   SCHED_IDLE         0       0       1,0,0

                The only tcl proc I have in this new thread is applied in the tps entry for the Event Properties in the Advanced Scheduler.  Any help with what is causing this would be much appreciated.  Thanks.

              • #75522
                Luke Anderson
                Participant

                  I have found out a little more on this problem scenario where I’m using a UPOC thread and Advanced Scheduling to generate a file at certain times.  When I first start the process and then the new UPOC thread, it performs as expected.  However, when I turn off the thread and attempt to re-start it, it causes the process to panic.  I turned up the EO config and it looks like it is failing on the restart when it goes to evaluate the number tcl procs used by the thread:

                  [pd  :pdtd:INFO/0:  d_adt_check:05/14/2013 14:25:26] Initializing Protocol Driver

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] *** Next Interval = 84584, Event(1) time = Wed May 15 13:55:10 2013

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] SCHEDULE EVENT DATA for EVENT 0

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Seconds(count = 1, type = TYPE_SINGLE) = 10

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Minutes(count = 1, type = TYPE_SINGLE) = 55

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Hours(count = 1, type = TYPE_SINGLE) = 13

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Monthday = * = NULL

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Month = * = NULL

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Weekday * = NULL

                  [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Number of procs = 896756067

                  [pti :sign:WARN/0:  d_adt_check:05/14/2013 14:25:26] Thread 2 ( d_adt_check ) received signal EXCEPTION_ACCESS_VIOLATION:

                   The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.

                  [pti :sign:WARN/0:  d_adt_check:05/14/2013 14:25:26] PC = 0xffffffff

                  [pti :thre:INFO/0:  d_adt_check:05/14/2013 14:25:26] Thread 2 was told to shutdown

                  [msi :msi :INFO/0:  d_adt_check:05/14/2013 14:25:26] Updating shared memory to mark threads as dead.

                  PANIC: “0”

                  PANIC: Calling “pti” for thread d_adt_cmd

                  The first time I start this thread, the number of proces = 1 which is what I have setup on the thread.  But when I stop and start this thread, it somehow things that there are 896756067 tcl procs on this thread and panics.  Anyone seen anything like this before and been able to resolve?  Thanks.

              Viewing 4 reply threads
              • The forum ‘Cloverleaf’ is closed to new topics and replies.