CLoverleaf and I/O to disk

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf CLoverleaf and I/O to disk

  • Creator
    Topic
  • #53612
    Bob Schmid
    Participant

    Runing Cloverleaf 5.8 Aix 6.1

    16 GIG Real

    Anybody out there having any issues called to their attention regarding amount of I/O?

    We are running about 1000 interfaces pushing about 6 Million messages a day….

    Recovery on every thread

    Smatting about 80% of the threads.

    All of it on Cloverleaf…..SAN disk

    There may be an issue with how these arrays have been laid out…our storage team looking at it…but has anyone else gone thru this ..and/or is there an internal consideration ….DAF…..Smatting…..the way memory and Disk are utilized by Cloverleaf….that may be something for me to consider in configuring my interfaces.?

    We have essentially 24 hospitals running on 1 piece of metal……thats good …right ?

Viewing 8 reply threads
  • Author
    Replies
    • #78287
      glen goldsmith
      Participant

      we have cloverleaf 5.8.5 on all VM’s………

      All the storage is on SAN.

      Our IO is pretty intensive.

      We do about 1/3rd of the volume ya’ll do per day

    • #78288
      Aurelien Garros
      Participant

      Hi,

      I don’t know if it will be useful for you but few weeks ago we have encountered an issue on a 5.8.5 on RHEL about the max “open files” of the system.

      A search on lots of files in Global monitor (executed by the hciss) has opened more than 1000 files in the same time (between 2 and 4 files for each real SMAT file) and the system limit was 1024…  it was not a good thing for Cloverleaf !

      You can see the system limit with the command : ulimit -a

      And the list of opened files with : lsof

      Hope it helps

      Aurelien

    • #78289
      glen goldsmith
      Participant

      open files are not a limitation; but the latency on our SAN is.

      When you do ‘top’……… cpu utilization will come across as a line like this:

      Cpu(s): 36.4%us, 42.3%sy,  0.0%ni,  1.7%id, 16.1%wa,  0.0%hi,  3.6%si,  0.0%st

      the 16.1%wa is IOWait……. how much time the processor is waiting on on disk.  That, at this point, is our bottleneck, SAN.

    • #78290

      glen goldsmith wrote:

      open files are not a limitation; but the latency on our SAN is.

      When you do ‘top’……… cpu utilization will come across as a line like this:

      Cpu(s): 36.4%us, 42.3%sy,

      -- Max Drown (Infor)

    • #78291

      As for the open files, here’s is the related insturctions form the install doc.

      System Parameter Settings

      You must complete the following steps before installing on your machine:

      1 Log in as the root user.

      2 Check your current kernel parameters using the following:

      >cat /proc/sys/kernel/shmmax

      >cat /proc/sys/kernel/sem

      >cat /proc/sys/fs/file-max

      3 Backup /etc/sysctl.conf and apply the following settings:

      Note: There is no need to apply these settings if your current settings are greater than these.

      >echo “kernel.shmmax=2147483648” >> /etc/sysctl.conf

      >echo “kernel.sem=250 32000 100 1024” >> /etc/sysctl.conf

      >echo “fs.file-max=65536” >> /etc/sysctl.conf

      4 Reboot your machine or execute the following command to make the parameters effective:

      >sysctl -p

      Note: You might need to schedule downtime with your user community before rebooting.

      You will need to increase the numbers as needed by your environment.

      -- Max Drown (Infor)

    • #78292
      glen goldsmith
      Participant

      As a follow up….. the things we’ve done have dramatically dropped IO Wait %……..

      What we have, is a /sites and /data hierarchy.  Before, we had the entire $HCISITEDIR on /sites and just storing archived SMAT files on data.

      What we’ve done now:

      /sites still has $HCISITEDIR execpt the $HCISITEDIR/exec/processes tree.

      That’s on /data now

      So your primary IO, are log files, smat, etc on one disk……. and the raima/recorvery|error|icl database on /sites

      So now our IO is split between two disks.

      Each disk now has a controller.  In VMWare, there is a virtual controller for /sites, another for /data and a third for everything else.

      This seems to have cut our IO wait by 2/3rds.  We were in the 20-40% range on IO Wait %.  Now, most of the time, we’re less than 10%.

      VMWare has the same “bottlenecks” that a real controller has….. so this would  work on actual hardware.

    • #78293
      Russ Ross
      Participant

      Robert:

      We have a similar load like you described with 6 million messages a day and SMAT turned on for about everything.

      I had concerns about SAN performance because when we were on dedicated disks we had to get creative with dividing the disk and managing what went where physically to keep up.

      It might be the actual SAN hardware can vary in design and even utilize SSD drives which can be much faster.

      I’m not sure if that is the case here but our SAN performance has been quite delightful for our cloverleaf servers much to my surprise.

      The painful part of our SANs have been that they aren’t reliable and go down hard and abruptly.

      I was talking to our sys admins about your post and wondering why would we show a meager I/O wait of about 3% on average for similar I/O demands to yours.

      Then I was informed by the admins some of our other servers using the same SAN have very bad I/O perfomance, which led our discusion as to why the difference.

      One of the observed circumstances with the servers that had bad SAN performance was that they were being mirrorred to another SAN via a low band witdth network capability like to the disaster recovery site.

      Investigating this possibility might uncover a possibility not being considered for your situation.

      I aslo taked to our admins about dedicated LUN and we are not on a dedicated LUN at this time.

      Russ Ross
      RussRoss318@gmail.com

    • #78294
      glen goldsmith
      Participant

      Since we’re on VMWare…… we’ve had to set SYNCFILES=1 in the rdm.ini, which dramatically slowed Cloverleaf (and increased IO) than when it wasn’t on.  However, even with this handicap – Cloverleaf is orders of mangitude faster than the hardware we came from.

      We do have dedicated LUNs… and our SAN has SSD’s — however, the IO demands of Cloverleaf aren’t near as great as some of the other apps we have, so Cloverleaf hardly gets to flash.

    • #78295
      Bob Richardson
      Participant

      Greetings,

      As you asked for any input here is our situation which is serving us fine

      so far regarding SMAT files etc.  We use SAN disk pools.

      Running:  5.8.5.0 on AIX 6.1 TL 7 Virtualized

                    ulimit -a

                    time(seconds)        unlimited

                    file(blocks)         unlimited

                    data(kbytes)         unlimited

                    stack(kbytes)        2097152

                    memory(kbytes)       2097152

                    coredump(blocks)     2097151

                    nofiles(descriptors) 10000

                    threads(per process) unlimited

                    processes(per user)  unlimited

      Sites: 9 all reside on dedicated filesystem allocations (logicial volumes)

      Message volume monthly (approx): 12 million

      Also run Global Monitor 6.0.1 which opens files as well for SMAT indexing

      operations.  This adds to the open files numbers – no hard values at

      this time – watching now.

      Our history has been that keeping the sites on their own dedicated filesystem (logical volume) keeps the Disk I/O manageable.

      When we packed all of our sites on ONE volume then we ran into problems.

      We do hourly SMAT archiving using our own scripts rather than vendor

      supplied versions.  And keep SMAT files only for 14 days.

      We use Tivoli software to archive SMAT files for longer retention.

      Then of course work at keeping debug statements out of our TCL,Xlate

      inline code to cut DISK I/O in the process logs.

      My apologies for not being hardware specific or using the Operating System lingo – more of an application type person than a hardware geek.

      Hope this contributes to the dialog and learning here.

Viewing 8 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,117
Forums
28
Topics
9,293
Replies
34,435
Topic Tags
286
Empty Topic Tags
10