5.6 on Solaris 10

This topic has 13 replies, 6 voices, and was last updated 16 years, 3 months ago by David Harrison.

Creator

Topic
March 11, 2009 at 1:40 pm #50696
Bill Bertera
Participant
We recently upgraded 2 new servers to:

CLV 5.6 (from 5.2)

Solaris 10 (from 8 )

and are experiencing major performance issues in our testing environment. We would expect major improvements, but performance has actually declined. We are unable to pinpoint if the issue is with the new servers, Solaris 10, or CLV 5.6.

One issue is that that installation guide for CLV recommends /etc/system settings that are now obsolete in Solaris 10, so we’re not sure if the new settings are set properly.

So, here are my questions:

1. Is anyone here running 5.6 on Solaris 10 in a high volume environment?

2. Does anyone have any ideas as how to run benhcmark tests to determine what resource is slowing down our system (server, CLV, OS)?

3. Does anyone know the Solaris 10 recomendations for /etc/system and /etc/project?

4. Does anyone know of any system monitoring we should be looking at to determine what resource is causing the poor performance? CPU usage is quite low (95%+ idle)

thanks,

-Bill
Creator

Topic

Viewing 12 reply threads

Author

Replies
- March 12, 2009 at 2:40 pm #67148
  Max Drown (Infor)
  Keymaster
  ~~Quote:~~
  
  experiencing major performance issues in our testing environment
  
  Can you provide more detailed information on what you mean by “performance issues”?
  
  -- Max Drown (Infor)
- March 12, 2009 at 4:09 pm #67149
  Bill Bertera
  Participant
  ~~Max Drown wrote:~~
  
  ~~Quote:~~
  
  experiencing major performance issues in our testing environment
  
  Can you provide more detailed information on what you mean by “performance issues”?
  
  We run a set number of messages through a test site, equal to our peak hours volume, and then twice our peak hours volume. On our old/current servers we can keep up with the 2X test. On the new servers our queues are backing up and having longer Xlate times on the 1X test. We’ve since ran other tests, as simple as 2 threads talking TCP/IP, and 1 thread routing to another. All of the tests are considerably slower on the new servers, that should be much more powerful.
- March 12, 2009 at 4:25 pm #67150
  Max Drown (Infor)
  Keymaster
  Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.
  
  How are you sending the data through? From one server to another? Site to site?
  
  -- Max Drown (Infor)
- March 12, 2009 at 7:26 pm #67151
  Bill Bertera
  Participant
  ~~Max Drown wrote:~~
  
  Have you checked the logs for errors? Disk I/O such as a lot of logging can slow things down considerably.
  
  How are you sending the data through? From one server to another? Site to site?
  
  We’ve tried, sending the data TCP, file, hcicmd resend, just about every possible way. The important thing is that everything we’ve tested on the new server/OS/CLV we parallel test on our old server to compare. So we’re pretty sure its not the way we’ve built the interfaces, because the same interfaces on the old server is outperforming it by a large margin.
- March 12, 2009 at 8:04 pm #67152
  Max Drown (Infor)
  Keymaster
  I guess what I’m suggesting is that it’s possible your 5.6 interfaces may be having issues and are using a lot of disk I/O to log the problems.
  
  Here’s an example of what I mean. When we upgraded from 5.2.1 to 5.6 and moved from AIX to RedHat we discovered that a thread that was sending across a VPN connection and through a firewall would be disconnected by the firewall after a certain amount of inactivity and then Cloverleaf would fill the log to 2GB and then the process would crash. That’s a lot of disk I/O.
  
  Now, that being said, your problem may have nothing at all to do with this sort of thing and indeed may be related to how your OS is configured. But, I would recommend you look at your process log and hicmonitord logs just to make sure they look OK.
  
  When you do figure out what the issue is, I’d love to know what it was. So, please post your solution here.
  
  -- Max Drown (Infor)
- March 12, 2009 at 8:10 pm #67153
  Bill Bertera
  Participant
  thanks for the tips. We have indeed checked out the logs, and that’s also why we tested with several different interfaces to rule interface specific stuff out. And very basic straight TCP -> localhost TCP interfaces are taking twice as long. There’s even a visible delay in doing a “resend” of a large file to the engine. Pretty much everything Cloverleaf does on the new servers is taking longer.
- March 12, 2009 at 9:40 pm #67154
  Michael Hertel
  Participant
  I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.
- March 12, 2009 at 10:00 pm #67155
  Max Drown (Infor)
  Keymaster
  ~~Michael Hertel wrote:~~
  
  I’m upgrading to 5.7 on Aix using the same hardware and feel the same although I can’t prove it. I smell something fishy.
  
  We did an AIX OS upgrade and Cloverleaf upgrade to 5.6. We didn’t experience any issues at all. However, we are now off of AIX and on RedHat, but not because of any issue with performance on AIX.
  
  -- Max Drown (Infor)
- March 17, 2009 at 3:15 pm #67156
  Bill Bertera
  Participant
  We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running.
  
  This is what we “upgraded” to:
  
  System = SunOS
  
  Release = 5.10
  
  Machine = sun4v sparc SUNW,SPARC-Enterprise-T5220
  
  NumCPU = 64
  
  Memory: 32G
  
  Does anyone know any reason why Cloverleaf would have poor performance on this type of server? Or does anyone recommend any other model server?
  
  Healthvision sent us their benchmark tests, but this model is not included.
- March 17, 2009 at 7:40 pm #67157
  Ric Cross
  Participant
  ~~Bill Bertera wrote:~~
  
  We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…
  
  .
  
  You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.
  
  Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size
- March 17, 2009 at 9:41 pm #67158
  Bill Bertera
  Participant
  ~~Ric Cross wrote:~~
  
  ~~Bill Bertera wrote:~~
  
  We’re starting to narrow down our problem, and the likely suspect is the model of Sun server we’re running…
  
  .
  
  You may want to compare your Solaris 10 semaphore settings to the Solaris 8 semaphore settings.
  
  Another though less likely place to look is possibly comparing/tuning of the TCP/IP window frame size
  
  semaphore settings we have taken care of with the new settings. And I’ll take a look at the TCP/IP window frame size. thanks for the suggestions.
  
  EDIT:
  
  our old & new servers are both set the same (assuming this is the setting you’re talking about):
  
  /home/hci>ndd /dev/tcp tcp_cwnd_max
  
  1048576
- March 18, 2009 at 2:26 pm #67159
  mike kram
  Participant
  I’m seeing significant waits on userlocks on the active hciengine process on 5.6. I’m seeing significicantly greater than 50% time spent in userlocks (observed in the LCK column when running prstat -am. On 5.2, I see consistently less than 50 % time spent waiting for userlocks.
  
  Any ideas? this is in solaris 10 running on a 1.2 GHz T5220 with 64 threads (8 cores) and 32 GB or RAM. No other system resource seems stessed at all.
  
  Anyone running 5.6 on Solaris 10 on T2000/T5220, or similar CMT boxes?
  
  Mike
- March 25, 2009 at 1:33 pm #67160
  David Harrison
  Participant
  I evaluated a T1000 SUN Coolthreads server some time ago. I ran a benchmark which, if I remember correctly, was file based. The performance was awful
Author

Replies

Viewing 12 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.