Protocol UPOC is Crashing Our Engine

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf Protocol UPOC is Crashing Our Engine

  • Creator
    Topic
  • #52801
    Carter Harrison
    Participant

    We are in the process of upgrading from Cloverleaf 5.6 to version 5.8.4P  We are running on AIX 5.3.

    I have an outbound thread configured to use a custom protocol UPOC (which worked fine under version 5.6).  When the process starts, the thread comes online and all is fine.  If the thread is stopped, everything is fine.  If I then try to start the thread, the entire process will crash.

    I have the thread configured to use the same UPOC in both read and write mode.  The read mode is using advanced scheduling and it is in the read mode context that the engine appears to crash.

    I have pared down the TCL file so that it is completely bare, yet the crash still occurs.  Here is the UPOC currently:

    Code:


    proc fileUPOC { args } {
       global HciConnName

       keylget args MODE mode
       set ctx “”; keylget args CONTEXT ctx
       set uargs {}; keylget args ARGS uargs
       set module “fileUPOC/$HciConnName/$ctx”
       echo “$module: Running In Mode $mode”
    }

    When the process starts up, I see the following output from the UPOC in the process log.

    Quote:


    fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

    fileUPOC/FILE_OUT/pdupoc_read: Running In Mode start

    However after I stop the thread and attempt to restart it, this is what I see in the process log immediately before the crash.

    Quote:


    fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

    [icl :tcpi:ERR /0: upoctest_cmd:11/11/2011 10:21:11] write failed: Bad file number

    [cmd :cmd :INFO/0: upoctest_cmd:11/11/2011 10:21:11] Inrecoverable socket error.  Closing connection.

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Shutdown failed: Bad file number

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Close failed: Bad file number

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] imhVerify failed at free 0x30fef230

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Bad attempt to free 0x30fef230

    [icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

    PANIC: “0”

    PANIC: Calling “pti” for thread upoctest_cmd

    The “write” context is clearly executing correctly, but the “read” context never gets to my echo statement before the crash.  If I reconfigure the thread so that it does not use a UPOC for reads, then the crash does not occur.  

    Has anybody else seen anything like this or is anybody else using a UPOC for both read and write in a Cloverleaf 5.8 thread?

Viewing 4 reply threads
  • Author
    Replies
    • #75518
      Levy Lazarre
      Participant

      Carter,

      I don’t think the “read” context is seeing your “echo” statement at all. I believe that you have pared down your tcl proc too much.

      You need at least at least an empty “time” section in your proc because this is the code the advanced scheduler is going to use for the Read.

      Code:



      time {
         # The proc will enter here if it is configured as a Read TPS
           # and an interval or advanced scheduling configured

         # The disposition list
         set dispList {}

           echo “$module: Running In Mode $mode”

           return $dispList
      }

      Similarily, the Write context should be running in Mode “run”.

    • #75519
      Rob Abbott
      Keymaster

      Carter, if you haven’t already, would you please open a support case on this?

      Rob Abbott
      Cloverleaf Emeritus

    • #75520
      Carter Harrison
      Participant

      Thanks Rob.  That was my next step.  The person who replied earlier was trying to be helpful, but in all honesty there is no way that a TCL proc should be allowed to crash the engine.

    • #75521
      Luke Anderson
      Participant

      Hi all, I’m seeing something very similar with a UPOC thread I am trying to setup using the Advanced Scheduling routine.  I can start the thread fine the first time but if I turn the thread off and then back on, it kills the process and throws the error below:

      [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Starting protocol thread d_adt_checkfile as tid 2.

      [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Applying EO config: ”

      [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] Thread 2 ( d_adt_checkfile ) received signal EXCEPTION_ACCESS_VIOLATION:

       The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.

      [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] PC = 0xffffffff

      PANIC: “0”

      PANIC: Calling “pti” for thread d_adt_cmd


      Scheduler State


      Thread Events     State      Priority Runnable  PT Msgs

        0      0   SCHED_IDLE         0       0       0,0,0

        1      0   SCHED_IDLE         0       0       0,0,0

        2      0   SCHED_IDLE         0       0       1,0,0

      The only tcl proc I have in this new thread is applied in the tps entry for the Event Properties in the Advanced Scheduler.  Any help with what is causing this would be much appreciated.  Thanks.

    • #75522
      Luke Anderson
      Participant

      I have found out a little more on this problem scenario where I’m using a UPOC thread and Advanced Scheduling to generate a file at certain times.  When I first start the process and then the new UPOC thread, it performs as expected.  However, when I turn off the thread and attempt to re-start it, it causes the process to panic.  I turned up the EO config and it looks like it is failing on the restart when it goes to evaluate the number tcl procs used by the thread:

      [pd  :pdtd:INFO/0:  d_adt_check:05/14/2013 14:25:26] Initializing Protocol Driver

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] *** Next Interval = 84584, Event(1) time = Wed May 15 13:55:10 2013

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] SCHEDULE EVENT DATA for EVENT 0

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Seconds(count = 1, type = TYPE_SINGLE) = 10

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Minutes(count = 1, type = TYPE_SINGLE) = 55

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Hours(count = 1, type = TYPE_SINGLE) = 13

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Monthday = * = NULL

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Month = * = NULL

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Weekday * = NULL

      [is  :ISEv:INFO/0:  d_adt_check:05/14/2013 14:25:26] Number of procs = 896756067

      [pti :sign:WARN/0:  d_adt_check:05/14/2013 14:25:26] Thread 2 ( d_adt_check ) received signal EXCEPTION_ACCESS_VIOLATION:

       The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.

      [pti :sign:WARN/0:  d_adt_check:05/14/2013 14:25:26] PC = 0xffffffff

      [pti :thre:INFO/0:  d_adt_check:05/14/2013 14:25:26] Thread 2 was told to shutdown

      [msi :msi :INFO/0:  d_adt_check:05/14/2013 14:25:26] Updating shared memory to mark threads as dead.

      PANIC: “0”

      PANIC: Calling “pti” for thread d_adt_cmd

      The first time I start this thread, the number of proces = 1 which is what I have setup on the thread.  But when I stop and start this thread, it somehow things that there are 896756067 tcl procs on this thread and panics.  Anyone seen anything like this before and been able to resolve?  Thanks.

Viewing 4 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,129
Forums
28
Topics
9,301
Replies
34,447
Topic Tags
288
Empty Topic Tags
10