Thread would NOT start after process panic

This topic has 7 replies, 4 voices, and was last updated 19 years, 7 months ago by Anonymous.

Creator

Topic
December 1, 2005 at 8:05 pm #48178
Rentian Huang
Participant
Greetings!!!

– v5.2 on AIX

One of my processes had a panic, then I run a clean up. But after I tried to bring up my process, the only thread that belongs to that process just NEVER started!!! (set to autostart) I couldn’t even bring the thread up manually.

Then I tried to stop the process, but every time I tried hcienginerun, it gives me the same error and, aftert 5 mins, the process stop.

Code: hcienginestop -p adt_raw Trying hcicmd… No response within timeout — Assuming process is hung! Exiting. hcicmd failed! Now trying SIGINT… Now trying SIGKILL… Process ‘adt_raw’ is not running

Running hciprocstatus give me this:

Code: adt_raw dead Terminated by signal 0 at Thu Dec 1 14:44:34 2005

I just re-run hcienginerun -p adt_raw and it panics again:

Code: adt_raw dead Abnormal exit – Cloverleaf software panic at Thu D

I have no problem with other processes in the same site. This happened after I tried to run a whole day worth data for testing.

Can anyone give me some advice, thanks!

Sam
Creator

Topic

Viewing 6 reply threads

Author

Replies
- December 2, 2005 at 12:19 pm #57914
  James Cobane
  Participant
  Sam,
  
  Take a look at the process log for the adt_raw process; it may give you a better clue as to what is happening. One quick thing that you can try initially is to run the ‘hcilmclear’ command on that process:
  
  hcilmclear -p adt_raw
  
  This does some clean-up work for the process after a panic; then try to re-start the process.
  
  Hope this helps.
  
  Jim Cobane
  
  Henry Ford Health
- December 2, 2005 at 1:56 pm #57915
  Anonymous
  Participant
  Also check the recovery database to see if there is any message causing the problem.
- December 2, 2005 at 2:18 pm #57916
  James Cobane
  Participant
  Carlos raises a very good point. Sometimes you may see a message with a state of 0 which will cause the associated process to crash until it is removed from the recovery database. Not sure what will cause a message to go to state 0, but it is likely related to something going awry in a tcl proc.
  
  Jim Cobane
  
  Henry Ford Health
- December 2, 2005 at 3:17 pm #57917
  garry r fisher
  Participant
  Hi Sam,
  
  I had a similar problem and asked Quovadx to dial in and look at it for me. They found a message in the recovery database in the wrong state and simply deleted this and we were able to start everthing backup again.
  
  Regards
  
  Garry
- December 2, 2005 at 3:29 pm #57918
  Rentian Huang
  Participant
  Thanks for all your responses!
  
  Garry, I have heard Greg Day told us the same scenario you discribed.
  
  I did open the log and found around 1000+ repetition of the following:
  
  Code: [msg :Msg :INFO/0:adt_raw_xlate] [0.0.8159727] Updating the recovery database [xlt :thre:INFO/1:adt_raw_xlate] [0.0.8159729] Requeuing: 1133462238.6509/xlate post
  
  Since I am working on the test site, there are too many msgs in the rdb. I took a rough look at them and seems they all in state 7.
  
  I will do a clearup again plus blowing away all msgs in the rdb, and see what happens…
  
  Sam 8)
- December 2, 2005 at 3:55 pm #57919
  Rentian Huang
  Participant
  Hi all,
  
  I did a hcidbinit -A and everything back to normal!!
  
  I guess the tricky thing here is if we were in production, how would we identify the msgs that cause the problem and how to fix it, we can’t just blow away all msgs in the rdb… Maybe base on their states as what James said???
  
  Garry, can you tell us more on the wrong state you mentioned?
  
  Thanks again!
  
  Sam
- December 2, 2005 at 6:49 pm #57920
  Anonymous
  Participant
  Being very careful with production I would sort the dbdump message list and find the top message, dump it into a file and run it through the appropriate test tool. The problem message is probably one of the currently being processed messages. Then delete the bad one from the db, you still have it in a file to mess with later.
Author

Replies

Viewing 6 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.