[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Offtopic : Large hostings and colocations ?where?





On 5/11/07, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:
I have no idea what kinds of disks you use but I haven't seen drives
fail very often.  Well not since I stopped dealing with IBM/Seagate SCSI
drives.

How about raid6 then?


I use Seagate, Hitachi and Maxtor.  They all have various levels of suckage depending on the models and production runs.

 Don't take my word for it.  There were 3 papers that came out at FAST this year although maybe it was 2 at FAST and another one from somewhere else.  One was Carnegie Mellon, the other is one from Google.

They are a good read, highly recommended.

http://www.usenix.org/events/fast07/tech/schroeder.html

http://www.usenix.org/events/fast07/tech/pinheiro.html


How many thousands of machines do you deal with?


1,500 machines online, 4 drives/machine, a mix of PATA and SATA. 4,000 hard drives in cold storage. 500 hard-drives in boxes awaiting possible recovery.
 

> 2 boxes with 4x500GB disks should cost close to $3K.  Mirror the data, the
> services, etc... and sleep easy at night.

And how do you keep machines mirrored constantly?  Having raid5 or 6 at
least means a single disk failure won't take down the machine and force
you to start up somewhere else.


Well, you can try and be fancy using drbd, or you can try and be fancier and do it at the application layer with rsync or your own smarts, i.e. Googles DFS or the Open Source implementation in Hadoop.

Basically an exercise for the reader.

HOWEVER, your point is well taken.  There is a difference between  archival storage and production storage.  I wouldn't have a problem using RAID5 or RAID6 on a production machine that had a derivative copy of the golden data.  It can give you huge performance wins under certain loads.  Backups, or original copies of the data are not something I would put on RAID, probably ever. 

It's not so much that running a few servers here or there with RAID controllers and drives from various manufactures don't run OK for time X, it's that statistically speaking, that "OK"ness isn't good enough.

Tape is lame and dead, so that's right out.  That leaves disk.  If there were an earthquake and your servers tumble over and their drives spill all over the place, I like the idea of walking into the pile and having some hope that a single drive has some amount of readable useful data on it.  If I was using RAID5/6, then it would be jumblies of parity bits and useless controller/RAID crap.

Again, it's a stronger case for archival storage or of your only copy.  i.e. at home, my mp3 collection could be sitting on RAID5, but it's not, it's on RAID1, so each disk is useful on it's own.

Sincerely Off Topic with apologies for that,
Joerg


Reply to: