5.6 on Solaris 10

Clovertech Forums Read Only Archives Cloverleaf Cloverleaf 5.6 on Solaris 10

  • Creator
    Topic
  • #50696
    Bill Bertera
    Participant

      We recently upgraded 2 new servers to:

      CLV 5.6 (from 5.2)

      Solaris 10 (from 8 )

      and are experiencing major performance issues in our testing environment. We would expect major improvements, but performance has actually declined. We are unable to pinpoint if the issue is with the new servers, Solaris 10, or CLV 5.6.

      One issue is that that installation guide for CLV recommends /etc/system settings that are now obsolete in Solaris 10, so we’re not sure if the new settings are set properly.

      So, here are my questions:

      1. Is anyone here running 5.6 on Solaris 10 in a high volume environment?

      2. Does anyone have any ideas as how to run benhcmark tests to determine what resource is slowing down our system (server, CLV, OS)?

      3. Does anyone know the Solaris 10 recomendations for /etc/system and /etc/project?

      4. Does anyone know of any system monitoring we should be looking at to determine what resource is causing the poor performance? CPU usage is quite low (95%+ idle)

      thanks,

      -Bill

    Viewing 12 reply threads
    • Author
      Replies
      • #67148

        Quote:

        experiencing major performance issues in our testing environment

        Can you provide more detailed information on what you mean by “performance issues”?

        -- Max Drown (Infor)

      • #67149
        Bill Bertera
        Participant

          Max Drown wrote:

          Quote:

          experiencing major performance issues in our testing environment

          Can you provide more detailed information on what you mean by “performance issues”?

          We run a set number of messages through a test site, equal to our peak hours volume, and then twice our peak hours volume. On our old/current servers we can keep up with the 2X test. On the new servers our queues are backing up and having longer Xlate times on the 1X test. We’ve since ran other tests, as simple as 2 threads talking TCP/IP, and 1 thread routing to another. All of the tests are considerably slower on the new servers, that should be much more powerful.

        • #67150

          Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.

          How are you sending the data through? From one server to another? Site to site?

          -- Max Drown (Infor)

        • #67151
          Bill Bertera
          Participant

            Max Drown wrote:

            Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.

            How are you sending the data through? From one server to another? Site to site?

            We’ve tried, sending the data TCP, file, hcicmd resend, just about every possible way. The important thing is that everything we’ve tested on the new server/OS/CLV we parallel test on our old server to compare. So we’re pretty sure its not the way we’ve built the interfaces, because the same interfaces on the old server is outperforming it by a large margin.

          • #67152

            I guess what I’m suggesting is that it’s possible your 5.6 interfaces may be having issues and are using a lot of disk I/O to log the problems.

            Here’s an example of what I mean. When we upgraded from 5.2.1 to 5.6 and moved from AIX to RedHat we discovered that a thread that was sending across a VPN connection and through a firewall would be disconnected by the firewall after a certain amount of inactivity and then Cloverleaf would fill the log to 2GB and then the process would crash. That’s a lot of disk I/O.

            Now, that being said, your problem may have nothing at all to do with this sort of thing and indeed may be related to how your OS is configured. But, I would recommend you look at your process log and hicmonitord logs just to make sure they look OK.

            When you do figure out what the issue is, I’d love to know what it was. So, please post your solution here.

            -- Max Drown (Infor)

          • #67153
            Bill Bertera
            Participant

              thanks for the tips. We have indeed checked out the logs, and that’s also why we tested with several different interfaces to rule interface specific stuff out. And very basic straight TCP -> localhost TCP interfaces are taking twice as long. There’s even a visible delay in doing a “resend” of a large file to the engine. Pretty much everything Cloverleaf does on the new servers is taking longer.

            • #67154
              Michael Hertel
              Participant

                I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.

              • #67155

                Michael Hertel wrote:

                I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.

                We did an AIX OS upgrade and Cloverleaf upgrade to 5.6. We didn’t experience any issues at all. However, we are now off of AIX and on RedHat, but not because of any issue with performance on AIX.

                -- Max Drown (Infor)

              • #67156
                Bill Bertera
                Participant

                  We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running.

                  This is what we “upgraded” to:

                  System = SunOS

                  Release = 5.10

                  Machine = sun4v sparc SUNW,SPARC-Enterprise-T5220

                  NumCPU = 64

                  Memory: 32G

                  Does anyone know any reason why Cloverleaf would have poor performance on this type of server? Or does anyone recommend any other model server?

                  Healthvision sent us their benchmark tests, but this model is not included.

                • #67157
                  Ric Cross
                  Participant

                    Bill Bertera wrote:

                    We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…

                    .

                    You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.

                    Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size

                  • #67158
                    Bill Bertera
                    Participant

                      Ric Cross wrote:

                      Bill Bertera wrote:

                      We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…

                      .

                      You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.

                      Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size

                      semaphore settings we have taken care of with the new settings. And I’ll take a look at the TCP/IP window frame size. thanks for the suggestions.

                      EDIT:

                      our old & new servers are both set the same (assuming this is the setting you’re talking about):

                      /home/hci>ndd /dev/tcp tcp_cwnd_max

                      1048576

                    • #67159
                      mike kram
                      Participant

                        I’m seeing significant waits on userlocks on the active hciengine process on 5.6.  I’m seeing significicantly greater than 50% time spent in userlocks (observed in the LCK column when running prstat -am.  On 5.2, I see consistently less than 50 % time spent waiting for userlocks.

                        Any ideas?  this is in solaris 10 running on a 1.2 GHz T5220 with 64 threads (8 cores) and 32 GB or RAM.  No other system resource seems stessed at all.

                        Anyone running 5.6 on Solaris 10 on T2000/T5220, or similar CMT boxes?

                        Mike

                      • #67160
                        David Harrison
                        Participant

                          I evaluated a T1000 SUN Coolthreads server some time ago. I ran a benchmark which, if I remember correctly, was file based. The performance was awful

                      Viewing 12 reply threads
                      • The forum ‘Cloverleaf’ is closed to new topics and replies.