- This topic has 4 replies, 4 voices, and was last updated 18 years, 11 months ago by .
-
Topic
-
We are running Cloverleaf v. 5.3 on HP Unix and have four working sites on the same box. We have been having problems with one site where it is running fine, then it starts dumping certain messages to the error database and then it basically stops working and has to be killed and cleaned up to get it back working. We have suffered with this for about the past two weeks and are under a support contract with a vendor who cannot resolve our problem. I have a theory as to what is going on but I need some guidance from someone who understands Cloverleaf under the covers. Here is the sequence of events:
Start site – all works fine
About a day passes, then Orders and Cancel Orders start going to the error db. All processes on the site start logging database errors.
Nothing will shut down nicely at this point. Kill everything, clean up, restart and all is fine.
The messages that go to the error database show a tcl callout error. The thread that does the routing shows this in the error log:
10/09/2005 20:24:31 [msi :msi :ERR /0: sendafile] msiSectionLock: Can’t lock semaphore for thread sendafile: Too many open files10/09/2005 20:24:31 [msi :msi :ERR /0: sendafile] msiExportStats: Can’t lock data section for thread sendafile
Our Unix admin has looked at the box and noticed that there are a lot of kill processes popping in and out all the time.
On this site, we use a tcl proc that reads in messages and either continues them or kills them depending on contents. We also use hcitpsmsgkill and kill_ob_save on the outbound threads on that site.
So, here is the question (finally
🙂 )Does executing all these kills somehow “use up” all available semaphores on a site level? This is what appears to be happening. About 33,000 messages come in to the site daily. There are three processes running and 8 threads.
Any insight would be greatly appreciated.
- The forum ‘Cloverleaf’ is closed to new topics and replies.