CLoverleaf and I/O to disk

This topic has 9 replies, 6 voices, and was last updated 12 years, 5 months ago by Bob Richardson.

Creator

Topic
April 16, 2013 at 6:54 pm #53612
Bob Schmid
Participant
Runing Cloverleaf 5.8 Aix 6.1

16 GIG Real

Anybody out there having any issues called to their attention regarding amount of I/O?

We are running about 1000 interfaces pushing about 6 Million messages a day….

Recovery on every thread

Smatting about 80% of the threads.

All of it on Cloverleaf…..SAN disk

There may be an issue with how these arrays have been laid out…our storage team looking at it…but has anyone else gone thru this ..and/or is there an internal consideration ….DAF…..Smatting…..the way memory and Disk are utilized by Cloverleaf….that may be something for me to consider in configuring my interfaces.?

We have essentially 24 hospitals running on 1 piece of metal……thats good …right ?
Creator

Topic

Viewing 8 reply threads

Author

Replies
- April 25, 2013 at 9:35 pm #78287
  glen goldsmith
  Participant
  we have cloverleaf 5.8.5 on all VM’s………
  
  All the storage is on SAN.
  
  Our IO is pretty intensive.
  
  We do about 1/3rd of the volume ya’ll do per day
- April 26, 2013 at 8:56 am #78288
  Aurelien Garros
  Participant
  Hi,
  
  I don’t know if it will be useful for you but few weeks ago we have encountered an issue on a 5.8.5 on RHEL about the max “open files” of the system.
  
  A search on lots of files in Global monitor (executed by the hciss) has opened more than 1000 files in the same time (between 2 and 4 files for each real SMAT file) and the system limit was 1024… it was not a good thing for Cloverleaf !
  
  You can see the system limit with the command : ulimit -a
  
  And the list of opened files with : lsof
  
  Hope it helps
  
  Aurelien
- April 26, 2013 at 4:06 pm #78289
  glen goldsmith
  Participant
  open files are not a limitation; but the latency on our SAN is.
  
  When you do ‘top’……… cpu utilization will come across as a line like this:
  
  Cpu(s): 36.4%us, 42.3%sy, 0.0%ni, 1.7%id, 16.1%wa, 0.0%hi, 3.6%si, 0.0%st
  
  the 16.1%wa is IOWait……. how much time the processor is waiting on on disk. That, at this point, is our bottleneck, SAN.
- April 26, 2013 at 4:10 pm #78290
  Max Drown (Infor)
  Keymaster
  ~~glen goldsmith wrote:~~
  
  open files are not a limitation; but the latency on our SAN is.
  
  When you do ‘top’……… cpu utilization will come across as a line like this:
  
  Cpu(s): 36.4%us, 42.3%sy,
  
  -- Max Drown (Infor)
- April 26, 2013 at 4:12 pm #78291
  Max Drown (Infor)
  Keymaster
  As for the open files, here’s is the related insturctions form the install doc.
  
  System Parameter Settings
  
  You must complete the following steps before installing on your machine:
  
  1 Log in as the root user.
  
  2 Check your current kernel parameters using the following:
  
  >cat /proc/sys/kernel/shmmax
  
  >cat /proc/sys/kernel/sem
  
  >cat /proc/sys/fs/file-max
  
  3 Backup /etc/sysctl.conf and apply the following settings:
  
  Note: There is no need to apply these settings if your current settings are greater than these.
  
  >echo “kernel.shmmax=2147483648” >> /etc/sysctl.conf
  
  >echo “kernel.sem=250 32000 100 1024” >> /etc/sysctl.conf
  
  >echo “fs.file-max=65536” >> /etc/sysctl.conf
  
  4 Reboot your machine or execute the following command to make the parameters effective:
  
  >sysctl -p
  
  Note: You might need to schedule downtime with your user community before rebooting.
  
  You will need to increase the numbers as needed by your environment.
  
  -- Max Drown (Infor)
- May 1, 2013 at 10:18 pm #78292
  glen goldsmith
  Participant
  As a follow up….. the things we’ve done have dramatically dropped IO Wait %……..
  
  What we have, is a /sites and /data hierarchy. Before, we had the entire $HCISITEDIR on /sites and just storing archived SMAT files on data.
  
  What we’ve done now:
  
  /sites still has $HCISITEDIR execpt the $HCISITEDIR/exec/processes tree.
  
  That’s on /data now
  
  So your primary IO, are log files, smat, etc on one disk……. and the raima/recorvery|error|icl database on /sites
  
  So now our IO is split between two disks.
  
  Each disk now has a controller. In VMWare, there is a virtual controller for /sites, another for /data and a third for everything else.
  
  This seems to have cut our IO wait by 2/3rds. We were in the 20-40% range on IO Wait %. Now, most of the time, we’re less than 10%.
  
  VMWare has the same “bottlenecks” that a real controller has….. so this would work on actual hardware.
- May 2, 2013 at 2:35 pm #78293
  Russ Ross
  Participant
  Robert:
  
  We have a similar load like you described with 6 million messages a day and SMAT turned on for about everything.
  
  I had concerns about SAN performance because when we were on dedicated disks we had to get creative with dividing the disk and managing what went where physically to keep up.
  
  It might be the actual SAN hardware can vary in design and even utilize SSD drives which can be much faster.
  
  I’m not sure if that is the case here but our SAN performance has been quite delightful for our cloverleaf servers much to my surprise.
  
  The painful part of our SANs have been that they aren’t reliable and go down hard and abruptly.
  
  I was talking to our sys admins about your post and wondering why would we show a meager I/O wait of about 3% on average for similar I/O demands to yours.
  
  Then I was informed by the admins some of our other servers using the same SAN have very bad I/O perfomance, which led our discusion as to why the difference.
  
  One of the observed circumstances with the servers that had bad SAN performance was that they were being mirrorred to another SAN via a low band witdth network capability like to the disaster recovery site.
  
  Investigating this possibility might uncover a possibility not being considered for your situation.
  
  I aslo taked to our admins about dedicated LUN and we are not on a dedicated LUN at this time.
  
  Russ Ross
  RussRoss318@gmail.com
- May 2, 2013 at 2:51 pm #78294
  glen goldsmith
  Participant
  Since we’re on VMWare…… we’ve had to set SYNCFILES=1 in the rdm.ini, which dramatically slowed Cloverleaf (and increased IO) than when it wasn’t on. However, even with this handicap – Cloverleaf is orders of mangitude faster than the hardware we came from.
  
  We do have dedicated LUNs… and our SAN has SSD’s — however, the IO demands of Cloverleaf aren’t near as great as some of the other apps we have, so Cloverleaf hardly gets to flash.
- May 3, 2013 at 2:30 pm #78295
  Bob Richardson
  Participant
  Greetings,
  
  As you asked for any input here is our situation which is serving us fine
  
  so far regarding SMAT files etc. We use SAN disk pools.
  
  Running: 5.8.5.0 on AIX 6.1 TL 7 Virtualized
  
  ulimit -a
  
  time(seconds) unlimited
  
  file(blocks) unlimited
  
  data(kbytes) unlimited
  
  stack(kbytes) 2097152
  
  memory(kbytes) 2097152
  
  coredump(blocks) 2097151
  
  nofiles(descriptors) 10000
  
  threads(per process) unlimited
  
  processes(per user) unlimited
  
  Sites: 9 all reside on dedicated filesystem allocations (logicial volumes)
  
  Message volume monthly (approx): 12 million
  
  Also run Global Monitor 6.0.1 which opens files as well for SMAT indexing
  
  operations. This adds to the open files numbers – no hard values at
  
  this time – watching now.
  
  Our history has been that keeping the sites on their own dedicated filesystem (logical volume) keeps the Disk I/O manageable.
  
  When we packed all of our sites on ONE volume then we ran into problems.
  
  We do hourly SMAT archiving using our own scripts rather than vendor
  
  supplied versions. And keep SMAT files only for 14 days.
  
  We use Tivoli software to archive SMAT files for longer retention.
  
  Then of course work at keeping debug statements out of our TCL,Xlate
  
  inline code to cut DISK I/O in the process logs.
  
  My apologies for not being hardware specific or using the Operating System lingo – more of an application type person than a hardware geek.
  
  Hope this contributes to the dialog and learning here.
Author

Replies

Viewing 8 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.