Running out of semephores

This topic has 8 replies, 6 voices, and was last updated 16 years, 1 month ago by james tey.

Creator

Topic
December 20, 2006 at 8:31 pm #48961
John Mercogliano
Participant
Has anyone experienced problems with running out of semaphores?

This is the situation. On our test systems, we are in a very active development phase and have added about 12 new processes and 50 connections to the test system. This is the error we get:

[msi :msi :ERR /0:cerner_result_ord_cmd] msiUpdateRegion: Can’t init thread to_ppid semaphore: No space left on device

We don’t have this problem on our production system but the only difference is one, we have not moved all the new sites to production, and two we don’t start and stop the sites/connections multiple times per day.

We clear this up by killing all engine processes, then delete any remaining semaphores using the HP-UX ipcs command.

I’m wondering if maybe the hcienginestop/run or hcicmd command is not properly cleaning up the semaphores.

If anyone has any insights, they would be welcome. This is happening not about every three weeks on use.

Thanks,

John Mercogliano
Sentara Healthcare
Hampton Roads, VA
Creator

Topic

Viewing 7 reply threads

Author

Replies
- December 21, 2006 at 1:32 pm #60247
  LeeAnne Kardas
  Participant
  This happen to my test environment. I brought down all processes and sites then rebooted our HP/UX Unix Server, then brought up all sites and processes. Try rebooting your HP/UX Unix Server.
- December 21, 2006 at 9:08 pm #60248
  Russ Ross
  Participant
  Since I have a huge number of sites, threads and message flow, I was concerned about understanding semaphores a bit better.
  
  Fortunately, I’ve never had a single semaphore issue during my 8+ years here.
  
  I searched old postings related to your semaphore problem as a start.
  
  Here is one that might be of interest:
  
  ~~https://usspvlclovertch2.infor.com/viewtopic.php?t=115&highlight=semaphore” class=”bbcode_url”>~~https://usspvlclovertch2.infor.com/viewtopic.php?t=115&highlight=semaphore
  
  One of the old posts states:
  
  ~~Quote:~~
  
  The problem surfaces when someone starts the process with a user name other then hci. This is because the semaphore and the shared memory segment cannot be accessed. When you see this, you can do an ipcs -a and you will see the semaphore and shared memory segment information.
  
  Since we do everything as hci this may have kept us out of semaphore trouble.
  
  Also, a couple of old posts I located indicated this problem is more likely to occur on a Windows platform.
  
  I’m on AIX and have not located where the limit of semaphores is set but did see a reference to increasing the number of files in the /etc/security/limits file – mine is set to 2048 at this time.
  
  Here is a quote window showing my /etc/security/limits settings for the hci users (2 of them are for QDX support):
  
  ~~Quote:~~
  
  hci:
  
  fsize = 2097151
  
  core = 2048
  
  cpu = -1
  
  data = 786432
  
  rss = 196608
  
  stack = 196608
  
  nofiles = 2048
  
  hcitest:
  
  fsize = 2097151
  
  core = 2048
  
  cpu = -1
  
  data = 786432
  
  rss = 196608
  
  stack = 196608
  
  nofiles = 2048
  
  hcispt1:
  
  fsize = 2097151
  
  core = 2048
  
  cpu = -1
  
  data = 786432
  
  rss = 196608
  
  stack = 196608
  
  nofiles = 2048
  
  A couple of semaphore related commads whose output I’ve yet to figure out are:
  
  ipcs -a
  
  sar -m 5 3
  
  I also found posts suggesting to keep process and thread names short:
  
  ~~Quote:~~
  
  We’ve had this alot, and I think it is because of long thread names. Threads shouldn’t be more than 15 characters, and processes 9. But after you change them, you need to delete the /stats directory, which is under /exec – probably why that fix works, but it just works temporarily.
  
  Do you have long thread or process names?
  
  We have exceeded these limits alot but do not have semaphore problems.
  
  Other posts also talked about the use of
  
  hcimsiutil -R
  
  hcimsiutil -rs
  
  Put it all together and I’m not really comfortable with my current understanding of semaphores
  
  My gut tells me that being I have such a high probablity of a semaphore error due to huge number of threads and sites and message flow, I suspect either running on AIX or just using the hci user ID has contributed to our stable success.
  
  Since security is pushing us to seperate logons, I will pay attention to see if any semaphore problems arise at that juncture.
  
  Russ Ross
  RussRoss318@gmail.com
- December 22, 2006 at 1:04 pm #60249
  Bill Bertera
  Participant
  Does HP/UX use the /etc/system settings? On Solaris, you have to define the amount of available semaphores in that file. Look for stuff like this:
  
  * Cloverleaf Interface Engine Requirements
  
  set semsys:seminfo_semmap=2050
  
  set semsys:seminfo_semaem=16384
  
  set semsys:seminfo_semmnu=2048
  
  set pt_cnt = 1024
  
  set semsys:seminfo_semmsl=8192
- December 22, 2006 at 7:35 pm #60250
  John Mercogliano
  Participant
  On HP-UX we use kmtune to query our kernel settings and they are all at QDX recommend for high use or higher. Also, we can clear this up with out rebooting, just doing the cleanup steps I mentioned. I’ve gathered the semaphore count before any hci ones are made when we reboot and when I do the cleanup steps the counts are back down to around when the system was booted. Based on the values returned by glanceplus when we run into this problem we have not hit our semaphore max. Usually this displays between 70 and 75 percent. I’m thinking we did not encounter this to often in the past because we reboot the system every 6 weeks. But the increase sites and connections might be causing it to hit sooner.
  
  Russ, I understand you confusion. I’ve also been trying to understand them and I feel I’m only half there.
  
  I have read all those post and except for the long names none of them apply. We always log on as hci.
  
  As for the ipcs command:
  
  running ipcs -s | wc -l will give you the current usage for the Semaphore Table (semmni). You have to subtract three.
  
  ipcs -m | wc -l minus 3 gives you Shared Mem Table (shmmni) usage.
  
  Adding a grep for hci will give you a count just for the hci user.
  
  I use glance to look at the current usage percentage of my system table values.
  
  I’m trying to duplicate the problem but I’m not getting anywhere. I have been trying to start and stop the process and monitord in different orders multiple times to see if that does anything by my counts are the same.
  
  One interesting thing I have found so far, if monitord is down, starting and stopping the process will create and destroy the sem files in the exec directory but if monitord is running the sem files hang around.
  
  I have also noticed that sometimes not all of the sem files are deleted when all processes and the monitord is shutdown. Does anyone know why this might happen and how to reproduce it. I’m wondering if this might be causing our problem.
  
  Thanks and keep the thoughts coming…
  
  Merry Christmas
  
  John Mercogliano
  Sentara Healthcare
  Hampton Roads, VA
- December 27, 2006 at 6:52 pm #60251
  Russ Ross
  Participant
  John:
  
  Here is what I see using
  
  ipcs -s
  
  IPC status from /dev/mem as of Wed Dec 27 13:30:42 CST 2006 T [code]IPC status from /dev/mem as of Wed Dec 27 13:30:42 CST 2006
  T
  
  Russ Ross
  RussRoss318@gmail.com
- May 22, 2009 at 3:31 am #60252
  james tey
  Participant
  ~~Bill Bertera wrote:~~
  
  Does HP/UX use the /etc/system settings? On Solaris, you have to define the amount of available semaphores in that file. Look for stuff like this:
  
  * Cloverleaf Interface Engine Requirements
  
  set semsys:seminfo_semmap=2050
  
  set semsys:seminfo_semaem=16384
  
  set semsys:seminfo_semmnu=2048
  
  set pt_cnt = 1024
  
  set semsys:seminfo_semmsl=8192
  
  Anyone can help to verify this applies for Solaris 10 systems?
- May 22, 2009 at 3:14 pm #60253
  David Harrison
  Participant
  For Solaris 10 and Cloverleaf 5.6 you just need the following (according to the installation instructions):
  
  set semsys:seminfo_semmni = 512
  
  I’m guessing the rest is self tuning.
- May 24, 2009 at 7:22 am #60254
  james tey
  Participant
  Thanks David.
  
  I’ve hit the exact same error when starting the 2nd test site monitor daemon (1st site monitor daemon started without any errors)
  
  I switch the site and try, yet the same error occurs.
  
  Nothing else was done but just adding the extra setting as listed by David after the 1st testing was done that shows the error. Rebooted Solaris and got the same thing.
  
  What else I’ve miss out?? 🙁
Author

Replies

Viewing 7 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.