CLoverleaf and I/O to disk

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf CLoverleaf and I/O to disk

  • Creator
    Topic
  • #53612
    Bob Schmid
    Participant

      Runing Cloverleaf 5.8 Aix 6.1

      16 GIG Real

      Anybody out there having any issues called to their attention regarding amount of I/O?

      We are running about 1000 interfaces pushing about 6 Million messages a day….

      Recovery on every thread

      Smatting about 80% of the threads.

      All of it on Cloverleaf…..SAN disk

      There may be an issue with how these arrays have been laid out…our storage team looking at it…but has anyone else gone thru this ..and/or is there an internal consideration ….DAF…..Smatting…..the way memory and Disk are utilized by Cloverleaf….that may be something for me to consider in configuring my interfaces.?

      We have essentially 24 hospitals running on 1 piece of metal……thats good …right ?

    Viewing 8 reply threads
    • Author
      Replies
      • #78287
        glen goldsmith
        Participant

          we have cloverleaf 5.8.5 on all VM’s………

          All the storage is on SAN.

          Our IO is pretty intensive.

          We do about 1/3rd of the volume ya’ll do per day

        • #78288
          Aurelien Garros
          Participant

            Hi,

            I don’t know if it will be useful for you but few weeks ago we have encountered an issue on a 5.8.5 on RHEL about the max “open files” of the system.

            A search on lots of files in Global monitor (executed by the hciss) has opened more than 1000 files in the same time (between 2 and 4 files for each real SMAT file) and the system limit was 1024…  it was not a good thing for Cloverleaf !

            You can see the system limit with the command : ulimit -a

            And the list of opened files with : lsof

            Hope it helps

            Aurelien

          • #78289
            glen goldsmith
            Participant

              open files are not a limitation; but the latency on our SAN is.

              When you do ‘top’……… cpu utilization will come across as a line like this:

              Cpu(s): 36.4%us, 42.3%sy,  0.0%ni,  1.7%id, 16.1%wa,  0.0%hi,  3.6%si,  0.0%st

              the 16.1%wa is IOWait……. how much time the processor is waiting on on disk.  That, at this point, is our bottleneck, SAN.

            • #78290

              glen goldsmith wrote:

              open files are not a limitation; but the latency on our SAN is.

              When you do ‘top’……… cpu utilization will come across as a line like this:

              Cpu(s): 36.4%us, 42.3%sy,

              -- Max Drown (Infor)

            • #78291

              As for the open files, here’s is the related insturctions form the install doc.

              System Parameter Settings

              You must complete the following steps before installing on your machine:

              1 Log in as the root user.

              2 Check your current kernel parameters using the following:

              >cat /proc/sys/kernel/shmmax

              >cat /proc/sys/kernel/sem

              >cat /proc/sys/fs/file-max

              3 Backup /etc/sysctl.conf and apply the following settings:

              Note: There is no need to apply these settings if your current settings are greater than these.

              >echo “kernel.shmmax=2147483648” >> /etc/sysctl.conf

              >echo “kernel.sem=250 32000 100 1024” >> /etc/sysctl.conf

              >echo “fs.file-max=65536” >> /etc/sysctl.conf

              4 Reboot your machine or execute the following command to make the parameters effective:

              >sysctl -p

              Note: You might need to schedule downtime with your user community before rebooting.

              You will need to increase the numbers as needed by your environment.

              -- Max Drown (Infor)

            • #78292
              glen goldsmith
              Participant

                As a follow up….. the things we’ve done have dramatically dropped IO Wait %……..

                What we have, is a /sites and /data hierarchy.  Before, we had the entire $HCISITEDIR on /sites and just storing archived SMAT files on data.

                What we’ve done now:

                /sites still has $HCISITEDIR execpt the $HCISITEDIR/exec/processes tree.

                That’s on /data now

                So your primary IO, are log files, smat, etc on one disk……. and the raima/recorvery|error|icl database on /sites

                So now our IO is split between two disks.

                Each disk now has a controller.  In VMWare, there is a virtual controller for /sites, another for /data and a third for everything else.

                This seems to have cut our IO wait by 2/3rds.  We were in the 20-40% range on IO Wait %.  Now, most of the time, we’re less than 10%.

                VMWare has the same “bottlenecks” that a real controller has….. so this would  work on actual hardware.

              • #78293
                Russ Ross
                Participant

                  Robert:

                  We have a similar load like you described with 6 million messages a day and SMAT turned on for about everything.

                  I had concerns about SAN performance because when we were on dedicated disks we had to get creative with dividing the disk and managing what went where physically to keep up.

                  It might be the actual SAN hardware can vary in design and even utilize SSD drives which can be much faster.

                  I’m not sure if that is the case here but our SAN performance has been quite delightful for our cloverleaf servers much to my surprise.

                  The painful part of our SANs have been that they aren’t reliable and go down hard and abruptly.

                  I was talking to our sys admins about your post and wondering why would we show a meager I/O wait of about 3% on average for similar I/O demands to yours.

                  Then I was informed by the admins some of our other servers using the same SAN have very bad I/O perfomance, which led our discusion as to why the difference.

                  One of the observed circumstances with the servers that had bad SAN performance was that they were being mirrorred to another SAN via a low band witdth network capability like to the disaster recovery site.

                  Investigating this possibility might uncover a possibility not being considered for your situation.

                  I aslo taked to our admins about dedicated LUN and we are not on a dedicated LUN at this time.

                  Russ Ross
                  RussRoss318@gmail.com

                • #78294
                  glen goldsmith
                  Participant

                    Since we’re on VMWare…… we’ve had to set SYNCFILES=1 in the rdm.ini, which dramatically slowed Cloverleaf (and increased IO) than when it wasn’t on.  However, even with this handicap – Cloverleaf is orders of mangitude faster than the hardware we came from.

                    We do have dedicated LUNs… and our SAN has SSD’s — however, the IO demands of Cloverleaf aren’t near as great as some of the other apps we have, so Cloverleaf hardly gets to flash.

                  • #78295
                    Bob Richardson
                    Participant

                      Greetings,

                      As you asked for any input here is our situation which is serving us fine

                      so far regarding SMAT files etc.  We use SAN disk pools.

                      Running:  5.8.5.0 on AIX 6.1 TL 7 Virtualized

                                    ulimit -a

                                    time(seconds)        unlimited

                                    file(blocks)         unlimited

                                    data(kbytes)         unlimited

                                    stack(kbytes)        2097152

                                    memory(kbytes)       2097152

                                    coredump(blocks)     2097151

                                    nofiles(descriptors) 10000

                                    threads(per process) unlimited

                                    processes(per user)  unlimited

                      Sites: 9 all reside on dedicated filesystem allocations (logicial volumes)

                      Message volume monthly (approx): 12 million

                      Also run Global Monitor 6.0.1 which opens files as well for SMAT indexing

                      operations.  This adds to the open files numbers – no hard values at

                      this time – watching now.

                      Our history has been that keeping the sites on their own dedicated filesystem (logical volume) keeps the Disk I/O manageable.

                      When we packed all of our sites on ONE volume then we ran into problems.

                      We do hourly SMAT archiving using our own scripts rather than vendor

                      supplied versions.  And keep SMAT files only for 14 days.

                      We use Tivoli software to archive SMAT files for longer retention.

                      Then of course work at keeping debug statements out of our TCL,Xlate

                      inline code to cut DISK I/O in the process logs.

                      My apologies for not being hardware specific or using the Operating System lingo – more of an application type person than a hardware geek.

                      Hope this contributes to the dialog and learning here.

                  Viewing 8 reply threads
                  • The forum ‘Cloverleaf’ is closed to new topics and replies.