5.6 on Solaris 10

Homepage Clovertech Forums Read Only Archives Cloverleaf Cloverleaf 5.6 on Solaris 10

  • Creator
    Topic
  • #50696
    Bill Bertera
    Participant

    We recently upgraded 2 new servers to:

    CLV 5.6 (from 5.2)

    Solaris 10 (from 8 )

    and are experiencing major performance issues in our testing environment. We would expect major improvements, but performance has actually declined. We are unable to pinpoint if the issue is with the new servers, Solaris 10, or CLV 5.6.

    One issue is that that installation guide for CLV recommends /etc/system settings that are now obsolete in Solaris 10, so we’re not sure if the new settings are set properly.

    So, here are my questions:

    1. Is anyone here running 5.6 on Solaris 10 in a high volume environment?

    2. Does anyone have any ideas as how to run benhcmark tests to determine what resource is slowing down our system (server, CLV, OS)?

    3. Does anyone know the Solaris 10 recomendations for /etc/system and /etc/project?

    4. Does anyone know of any system monitoring we should be looking at to determine what resource is causing the poor performance? CPU usage is quite low (95%+ idle)

    thanks,

    -Bill

Viewing 12 reply threads
  • Author
    Replies
    • #67148

      Quote:

      experiencing major performance issues in our testing environment

      Can you provide more detailed information on what you mean by “performance issues”?

      -- Max Drown (Infor)

    • #67149
      Bill Bertera
      Participant

      Max Drown wrote:

      Quote:

      experiencing major performance issues in our testing environment

      Can you provide more detailed information on what you mean by “performance issues”?

      We run a set number of messages through a test site, equal to our peak hours volume, and then twice our peak hours volume. On our old/current servers we can keep up with the 2X test. On the new servers our queues are backing up and having longer Xlate times on the 1X test. We’ve since ran other tests, as simple as 2 threads talking TCP/IP, and 1 thread routing to another. All of the tests are considerably slower on the new servers, that should be much more powerful.

    • #67150

      Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.

      How are you sending the data through? From one server to another? Site to site?

      -- Max Drown (Infor)

    • #67151
      Bill Bertera
      Participant

      Max Drown wrote:

      Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.

      How are you sending the data through? From one server to another? Site to site?

      We’ve tried, sending the data TCP, file, hcicmd resend, just about every possible way. The important thing is that everything we’ve tested on the new server/OS/CLV we parallel test on our old server to compare. So we’re pretty sure its not the way we’ve built the interfaces, because the same interfaces on the old server is outperforming it by a large margin.

    • #67152

      I guess what I’m suggesting is that it’s possible your 5.6 interfaces may be having issues and are using a lot of disk I/O to log the problems.

      Here’s an example of what I mean. When we upgraded from 5.2.1 to 5.6 and moved from AIX to RedHat we discovered that a thread that was sending across a VPN connection and through a firewall would be disconnected by the firewall after a certain amount of inactivity and then Cloverleaf would fill the log to 2GB and then the process would crash. That’s a lot of disk I/O.

      Now, that being said, your problem may have nothing at all to do with this sort of thing and indeed may be related to how your OS is configured. But, I would recommend you look at your process log and hicmonitord logs just to make sure they look OK.

      When you do figure out what the issue is, I’d love to know what it was. So, please post your solution here.

      -- Max Drown (Infor)

    • #67153
      Bill Bertera
      Participant

      thanks for the tips. We have indeed checked out the logs, and that’s also why we tested with several different interfaces to rule interface specific stuff out. And very basic straight TCP -> localhost TCP interfaces are taking twice as long. There’s even a visible delay in doing a “resend” of a large file to the engine. Pretty much everything Cloverleaf does on the new servers is taking longer.

    • #67154
      Michael Hertel
      Participant

      I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.

    • #67155

      Michael Hertel wrote:

      I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.

      We did an AIX OS upgrade and Cloverleaf upgrade to 5.6. We didn’t experience any issues at all. However, we are now off of AIX and on RedHat, but not because of any issue with performance on AIX.

      -- Max Drown (Infor)

    • #67156
      Bill Bertera
      Participant

      We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running.

      This is what we “upgraded” to:

      System = SunOS

      Release = 5.10

      Machine = sun4v sparc SUNW,SPARC-Enterprise-T5220

      NumCPU = 64

      Memory: 32G

      Does anyone know any reason why Cloverleaf would have poor performance on this type of server? Or does anyone recommend any other model server?

      Healthvision sent us their benchmark tests, but this model is not included.

    • #67157
      Ric Cross
      Participant

      Bill Bertera wrote:

      We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…

      .

      You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.

      Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size

    • #67158
      Bill Bertera
      Participant

      Ric Cross wrote:

      Bill Bertera wrote:

      We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…

      .

      You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.

      Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size

      semaphore settings we have taken care of with the new settings. And I’ll take a look at the TCP/IP window frame size. thanks for the suggestions.

      EDIT:

      our old & new servers are both set the same (assuming this is the setting you’re talking about):

      /home/hci>ndd /dev/tcp tcp_cwnd_max

      1048576

    • #67159
      mike kram
      Participant

      I’m seeing significant waits on userlocks on the active hciengine process on 5.6.  I’m seeing significicantly greater than 50% time spent in userlocks (observed in the LCK column when running prstat -am.  On 5.2, I see consistently less than 50 % time spent waiting for userlocks.

      Any ideas?  this is in solaris 10 running on a 1.2 GHz T5220 with 64 threads (8 cores) and 32 GB or RAM.  No other system resource seems stessed at all.

      Anyone running 5.6 on Solaris 10 on T2000/T5220, or similar CMT boxes?

      Mike

    • #67160
      David Harrison
      Participant

      I evaluated a T1000 SUN Coolthreads server some time ago. I ran a benchmark which, if I remember correctly, was file based. The performance was awful

Viewing 12 reply threads
  • The forum ‘Cloverleaf’ is closed to new topics and replies.

Forum Statistics

Registered Users
5,125
Forums
28
Topics
9,294
Replies
34,439
Topic Tags
287
Empty Topic Tags
10