Protocol UPOC is Crashing Our Engine

This topic has 5 replies, 4 voices, and was last updated 12 years, 2 months ago by Luke Anderson.

Creator

Topic
November 11, 2011 at 4:50 pm #52801
Carter Harrison
Participant
We are in the process of upgrading from Cloverleaf 5.6 to version 5.8.4P We are running on AIX 5.3.

I have an outbound thread configured to use a custom protocol UPOC (which worked fine under version 5.6). When the process starts, the thread comes online and all is fine. If the thread is stopped, everything is fine. If I then try to start the thread, the entire process will crash.

I have the thread configured to use the same UPOC in both read and write mode. The read mode is using advanced scheduling and it is in the read mode context that the engine appears to crash.

I have pared down the TCL file so that it is completely bare, yet the crash still occurs. Here is the UPOC currently:

Code: proc fileUPOC { args } { global HciConnName keylget args MODE mode set ctx “”; keylget args CONTEXT ctx set uargs {}; keylget args ARGS uargs set module “fileUPOC/$HciConnName/$ctx” echo “$module: Running In Mode $mode” }

When the process starts up, I see the following output from the UPOC in the process log.

~~Quote:~~

fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

fileUPOC/FILE_OUT/pdupoc_read: Running In Mode start

However after I stop the thread and attempt to restart it, this is what I see in the process log immediately before the crash.

~~Quote:~~

fileUPOC/FILE_OUT/pdupoc_write: Running In Mode start

[icl :tcpi:ERR /0: upoctest_cmd:11/11/2011 10:21:11] write failed: Bad file number

[cmd :cmd :INFO/0: upoctest_cmd:11/11/2011 10:21:11] Inrecoverable socket error. Closing connection.

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Shutdown failed: Bad file number

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Close failed: Bad file number

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] imhVerify failed at free 0x30fef230

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] Bad attempt to free 0x30fef230

[icl :icl :ERR /0: upoctest_cmd:11/11/2011 10:21:11] 0x30fef230 type is 1431326531

PANIC: “0”

PANIC: Calling “pti” for thread upoctest_cmd

The “write” context is clearly executing correctly, but the “read” context never gets to my echo statement before the crash. If I reconfigure the thread so that it does not use a UPOC for reads, then the crash does not occur.

Has anybody else seen anything like this or is anybody else using a UPOC for both read and write in a Cloverleaf 5.8 thread?
Creator

Topic

Viewing 4 reply threads

Author

Replies
- November 11, 2011 at 9:04 pm #75518
  Levy Lazarre
  Participant
  Carter,
  
  I don’t think the “read” context is seeing your “echo” statement at all. I believe that you have pared down your tcl proc too much.
  
  You need at least at least an empty “time” section in your proc because this is the code the advanced scheduler is going to use for the Read.
  
  Code: time { # The proc will enter here if it is configured as a Read TPS # and an interval or advanced scheduling configured # The disposition list set dispList {} echo “$module: Running In Mode $mode” return $dispList }
  
  Similarily, the Write context should be running in Mode “run”.
- November 14, 2011 at 8:54 pm #75519
  Rob Abbott
  Keymaster
  Carter, if you haven’t already, would you please open a support case on this?
  
  Rob Abbott
  Cloverleaf Emeritus
- November 14, 2011 at 9:51 pm #75520
  Carter Harrison
  Participant
  Thanks Rob. That was my next step. The person who replied earlier was trying to be helpful, but in all honesty there is no way that a TCL proc should be allowed to crash the engine.
- May 10, 2013 at 9:53 pm #75521
  Luke Anderson
  Participant
  Hi all, I’m seeing something very similar with a UPOC thread I am trying to setup using the Advanced Scheduling routine. I can start the thread fine the first time but if I turn the thread off and then back on, it kills the process and throws the error below:
  
  [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Starting protocol thread d_adt_checkfile as tid 2.
  
  [prod:prod:INFO/0:d_adt_checkfile:05/09/2013 15:00:22] Applying EO config: ”
  
  [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] Thread 2 ( d_adt_checkfile ) received signal EXCEPTION_ACCESS_VIOLATION:
  
  The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.
  
  [pti :sign:WARN/0:d_adt_checkfile:05/09/2013 15:00:23] PC = 0xffffffff
  
  PANIC: “0”
  
  PANIC: Calling “pti” for thread d_adt_cmd
  
  Scheduler State
  
  Thread Events State Priority Runnable PT Msgs
  
  0 0 SCHED_IDLE 0 0 0,0,0
  
  1 0 SCHED_IDLE 0 0 0,0,0
  
  2 0 SCHED_IDLE 0 0 1,0,0
  
  The only tcl proc I have in this new thread is applied in the tps entry for the Event Properties in the Advanced Scheduler. Any help with what is causing this would be much appreciated. Thanks.
- May 14, 2013 at 9:43 pm #75522
  Luke Anderson
  Participant
  I have found out a little more on this problem scenario where I’m using a UPOC thread and Advanced Scheduling to generate a file at certain times. When I first start the process and then the new UPOC thread, it performs as expected. However, when I turn off the thread and attempt to re-start it, it causes the process to panic. I turned up the EO config and it looks like it is failing on the restart when it goes to evaluate the number tcl procs used by the thread:
  
  [pd :pdtd:INFO/0: d_adt_check:05/14/2013 14:25:26] Initializing Protocol Driver
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] *** Next Interval = 84584, Event(1) time = Wed May 15 13:55:10 2013
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] SCHEDULE EVENT DATA for EVENT 0
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Seconds(count = 1, type = TYPE_SINGLE) = 10
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Minutes(count = 1, type = TYPE_SINGLE) = 55
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Hours(count = 1, type = TYPE_SINGLE) = 13
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Monthday = * = NULL
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Month = * = NULL
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Weekday * = NULL
  
  [is :ISEv:INFO/0: d_adt_check:05/14/2013 14:25:26] Number of procs = 896756067
  
  [pti :sign:WARN/0: d_adt_check:05/14/2013 14:25:26] Thread 2 ( d_adt_check ) received signal EXCEPTION_ACCESS_VIOLATION:
  
  The thread attempted to read from or write to a virtual address for which it does not have the appropriate access.
  
  [pti :sign:WARN/0: d_adt_check:05/14/2013 14:25:26] PC = 0xffffffff
  
  [pti :thre:INFO/0: d_adt_check:05/14/2013 14:25:26] Thread 2 was told to shutdown
  
  [msi :msi :INFO/0: d_adt_check:05/14/2013 14:25:26] Updating shared memory to mark threads as dead.
  
  PANIC: “0”
  
  PANIC: Calling “pti” for thread d_adt_cmd
  
  The first time I start this thread, the number of proces = 1 which is what I have setup on the thread. But when I stop and start this thread, it somehow things that there are 896756067 tcl procs on this thread and panics. Anyone seen anything like this before and been able to resolve? Thanks.
Author

Replies

Viewing 4 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.