[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sas fileserver



on Tue, Jan 07, 2003 at 06:21:17PM -0800, Michael West (web@mitzit.net) wrote:
> I have been asked to help with getting a server for SAS.  One of the
> large expenses of this is the 200Gb+ RAID-5 disk on the EMC frame.
> When presented with $$$$$ the question came, can't I just get
> something I can put under my desk and save $$$$$? 
> 
> The SAS server will be on WIN2K.  I am thinking of using Debian with
> software RAID and SAMBA.  I have had good experience with this.  Maybe
> even use the 8mg cache western digital IDE drives.  We only expect a
> dozen users simultaneous or so, but working with large datasets.  
> 
> I have never seen anything about the best configuration of a file
> server with few connections and gobs of data being used per
> connection.  
> 
> Does anyone have experience with something similar?  How will SAMBA
> perform when hammered by SAS?   
> 
> For the purposes of this thread, let us assume that the maintenance,
> service, backup and recovery and such is satisfactorily worked out.
> They are the major problems, but I am looking for advice on just the
> fileserver question.

Michael, a few suggestions.

I've done a lot of SAS work, most of it in my past.  I've also worked
with GNU/Linux and some RAIDed filestorage, as well as Samba, more
recently.  GNU/Linux and Samba should be more than robust enough for
this purpose.

First, if what you're replacing is an EMC server, I'd suggest going
whole-hog with GNU/Linux:  SCSI RAID beats software on performance, and
IDE RAID on reliability.  The cost is higher by a significant fraction
(more than double), but if this is your primary data store, that
shouldn't be a hard sell.  200 GiB isn't all that big these days (you
can buy single IDE drives with that capacity).  Focus on reliability and
backups.  I've had very mixed results with 3Ware's Escalade products
(5xxx, 6xxx, and 7xxx) over a couple of years.

SAS analysis usage is usually a large single data pull, followed by
summarization and/or subsetting.  Networked access kills performance, so
you're likely not going to have all that much traffic on the dataserver.
If you can run multiple NICs out of the box, either dedicated to a
single analyst's PC, or on a load-balanced network, you'll improve
throughput markedly.  Contention on the fileserver itself is likely to
be low, but SCSI will help you there.

The pessimal configuration is when your SAS programmers try to do *all*
their work on the fileserver, and there's always some yahoo who does.
Saving working sets back is reasonable, but using the server for
SASWORK, SASSSORT, or other temporary or scratch space, really loads up
network traffic.  Discourage this if possible.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
   Keep software free.         Oppose the CBDTPA.         Kill S.2048 dead.
     http://www.eff.org/alerts/20020322_eff_cbdtpa_alert.html



Reply to: