Problems with ZFS file system

This topic has 4 replies, 3 voices, and was last updated 15 years, 11 months ago by Joe Halbrook.

Creator

Topic
August 28, 2009 at 12:18 pm #51152
Tim Wanner
Participant
I’m searching for other users that are using ZFS file system for Cloverleaf 5.5.

We are running a SUN M5000 with 8 x 2.15 GHz processesors and 32 GB system memory.

When we turn on x number of interfaces the system begins utilizing so much CPU that all of the Cloverleaf I/O’s begin to slow.

Messges proccess through the engine, but cannot get out and inbound messages cannot get in and mesages que in the engine.

If we stop a few processes it will free enough resources and the messages cross.

A server this size should be able to handle the load. We have ~550 interfaces and 425 processes running.

Any feedback would be greatly appreciated.

Tim
Creator

Topic

Viewing 3 reply threads

Author

Replies
- August 28, 2009 at 4:10 pm #68962
  Charlie Bursell
  Participant
  Is the file system mounted local or remote? If remote take a look at how the SAN is configured. Remember, Cloverleaf is an I/O hog and must be granted enough cycles to keep it going.
  
  One other thing I find rather strange is the allocaton ot interfaces to processes. You state you have about 550 interfaces but 425 processes?
  
  That is less than 2 threads/per process. Is there a valid reason fot this? Even with 8 CPUs, 425 processes is a lot!
  
  I certainly am not a salesman and never wish to be construed as such, but if you continue to have problems, you may want to talk to your CSR about a Site Survey.
- August 28, 2009 at 8:37 pm #68963
  Joe Halbrook
  Participant
  Hi Charlie.
  
  Your response regarding granting enough cycles triggered another question. We’ve been running CL 5.6 Rev 2 on a virtual server (VMware + ESX hosts + EMC Clarion SAN) under Red Hat 5.0.
  
  Recently, we had an incident where the SAN array which the Cloverleaf virtual server is stored on experienced higher than normal write cache flushes. Due to the fact that the ESX host timed out waiting for a write confirmation from the SAN array, the ESX host hosting the vitrual server sent SCSI abort commands to the SAN array. In turn, Red Hat was unable to write to its local drive in a timely manner. As a protective measure, Red Hat went into a read-only mode, which of course brought down all the Cloverleaf processes.
  
  I was curious if you or any others have heard of / experienced this behavior in a virtual server / SAN environment. Oddly, none of the other virtual servers (hosting a variety of applications) writing to the same SAN reacted in a similar manner during this incident.
  
  Thanks, in advance.
- August 29, 2009 at 3:38 pm #68964
  Charlie Bursell
  Participant
  Joe:
  
  I think we had problems similar to this with our Red Hat servers. I am not sure what the resolution was but Goutham will probably know. I’ll find out on Monday.
  
  Send me an e-mail and remind me.
- August 29, 2009 at 8:49 pm #68965
  Joe Halbrook
  Participant
  Thank you, Charlie.
  
  Any assistance would be appreciated. We encountered this incident again Friday night (8/29) for the second time in less than two weeks.
  
  Joe
Author

Replies

Viewing 3 reply threads

The forum ‘Cloverleaf’ is closed to new topics and replies.