[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: PostgreSQL+ZFS

Boyd Stephen Smith Jr. put forth on 1/1/2011 2:16 PM:

> Is your problem with RAID5 or the SSDs?


> Sudden disk failure can occur with SSDs, just like with magnetic media.  If

This is not true.  The failure modes and rates for SSDs are the same as
other solid state components, such as system boards, HBAs, and PCI RAID
cards, even CPUs (although SSDs are far more reliable than CPUs due to
the lack of heat generation).  SSDs only have two basic things in common
with mechanical disk drives:  permanent data storage and a block device
interface.  SSD, as the first two letters of the acronym tell us, have
more in common with other integrate circuit components in a system.  Can
an SSD fail?  Sure.  So can a system board.  But how often do your
system boards fail?  *That* is the comparison you should be making WRT
SSD failure rates and modes, *not* comparing SSDs with HDDs.

> you are going to use them in a production environment they should be RAIDed 
> like any disk.

I totally disagree.  See above.  However, if one is that concerned about
SSD failure, instead of spending the money required to RAID (verb) one's
db storage SSDs simply for fault recovery, I would recommend freezing
and snapshooting the filesystem to a sufficiently large SATA drive, and
then running differential backups of the snapshot to the tape silo.
Remember, you don't _need_ RAID with SSDs to get performance.  Mirroring
one's boot/system device is about the only RAID scenario I'd ever
recommend for SSDs, and even here I don't feel it's necessary.

> RAID 5 on SSDs is sort of odd though.  RAID 5 is really a poor man's RAID; 
> yet, SSDs cost quite a bit more than magnetic media for the same amount of 
> storage.

Any serious IT professional needs to throw out his old storage cost
equation.  Size doesn't matter and hasn't for quite some time.  Everyone
has more storage than they can possibly ever use.  Look how many
free*providers (Gmail) are offering _unlimited_ storage.

The storage cost equation should no longer be based on capacity (should
never have been IMO), but capability.  The disk drive manufacturers have
falsely convinced buyers over the last decade that size is _the_
criteria on which to base purchasing decisions.  This can't be further
from fact.  Mechanical drives have become so cavernous that most users
never come close to using the available capacity, not even 25% of it.
SSDs actually cost *less* than HDDs with the equation people should be
using, which is based on _capability_ and goes something like this, and
is not based on dollars but an absolute number--higher score is better:

storage_value=((IOPS+throughput)/unit_cost) + (MTBF/1M) - power_per_year

Power_per_year depends on local utility rates which can vary wildly
depending on locale.  For this comparison I'll use kwh pricing of $0.12
which is the PG&E average in the Los Angeles area.

For a Seagate 146GB 15k rpm SAS drive ($170):
storage_value = ((274 + 142) / 170) + (1.6) - 110
storage_value = -106

For an OCZ Vertex II 160GB SSD SATA II device ($330):
storage_value = ((50000 + 250) / 330) + (2.0) - 18
storage_value = 136

Notice the mechanical drive ended up with a substantial negative score,
and that the SSD is 242 points ahead due to massively superior IOPS.
This is because in today's high energy cost world, performance is much
more costly when using mechanical drives.  The Seagate drive above
represents the highest performance mechanical drive available.  It cost
$170 (bare drive) to acquire but costs $110 per year to operate in a
24x7 enterprise environment.  Two years energy consumption will be
greater than the acquisition cost.  By contrast, running the SSD costs a
much more reasonable $18 per year, and it will take 18 years of energy
consumption to surpass the acquisition cost.  As the published MTBF
ratings on the devices is so similar, 1.6 vs 2 million hours, this has
zero impact in the final ratings.

Ironically, the SSD is actually slightly _larger_ in capacity than the
mechanical drive in this case, as the SSDs fall between 120GB and 160GB,
and I chose the larger pricier option to give the mechanical drive more
of a chance.  It doesn't matter.  The SSD could cost $2000 and it will
still win by a margin of 115, for two reasons:  182 times the IOPS
performance and 1/6th the power consumption.

For the vast majority of enterprise/business workloads, IOPS and power
consumption are far more relevant than than total storage space,
especially for transactional database systems.  The above equation bears
this out.

> SSDs intended as HD replacements support more read/write cycles per block than 
> you will use for many decades, even if you were using all the disk I/O the 
> entire time.

Yep.  Most SSDs will, regardless of price.

> SSDs intended as HD replacements are generally faster than magnetic media, 
> though it varies based on manufacturer and workload.

All of the currently shipping decent quality SSDs outrun a 15k SAS drive
in every performance category.  You'd have to buy a really low end
consumer model such as the cheap A-Data's and Kingstons to get less
streaming throughput than an SAS drive.  And, obviously, every SSD, even
the el chapos, run IOPS circles around the fastest mechanicals.

But if we're talking strictly a business environment, one is going to be
buying higher end models of SSDs.  And you don't have to go all that far
up the price scale either.  The major price factor in SSDs is no longer
performance now that there are so many great controller chips available,
but is size.  The more flash chips in the device, the higher the cost.
The high performance controller chips (Sandforce et al) no longer have
that much bearing on price.

> I see little to no problem using SSDs in a production environment.

Me neither. :)

> Some people just hate on RAID 5.  It is fine for it's intended purpose, which 
> is LOTS for storage with some redundancy on identical (or near-identical) 
> drives.  I've run (and recovered) it on 3-6 drives.

It's fine in two categories:

1.  You never suffer power failure or a system crash
2.  Your performance needs are meager

Most SOHO setups do fine with RAID 5.  For any application that stores
large volumes of little or never changing data it's fine.  For any
application that performs constant random IO, such as a busy mail server
or db server, you should use RAID 10.

> However, RAID 1/0 is vastly superior in terms of reliability and speed.  It 
> costs a bit more for the same amount of usable space, but it is worth it.

Absolutely agree on both counts, except in one particular case: with the
same drive count, RAID 5 can usually out perform RAID 10 in streaming
read performance, but not by much.  RAID 5 reads require no parity
calculations so you get almost the entire spindle stripe worth of
performance.  Where RAID 10 really shines is in mixed workloads.  Throw
a few random writes into the streaming RAID 5 workload mentioned above
and it will slow things down quite dramatically.  RAID 10 doesn't suffer
from this.  Its performance is pretty consistent even with simultaneous
streaming and random workloads.

> I suggest you use RAID 1/0 on your SSDs, quite a few RAID 1/0 implementations 
> will work with 3 drives.  RAID 1/0 should be a little more performant and a 
> little less CPU intensive than RAID 5 for transaction logs.  As far as file 
> system, I think ext3 would be fine for this workload, although it would 
> probably be worth it to benchmark against ext4 to see if it gives any 
> improvement.

Again, RAID isn't necessary for SSDs.

Also, I really, really, wish people would stop repeating this crap about
mdraid's various extra "RAID 10" *layouts* being RAID 10!  They are NOT
RAID 10!

There is only one RAID 10, and the name and description have been with
us for over 15 years, LONG before Linux had a software RAID layer.
Also, it's not called "RAID 1+0" or "RAID 1/0".  It is simply called
"RAID 10", again, for 15+ years now.  It requires 4, or more, even
number of disks.  RAID 10 is a stripe across multiple mirrored pairs.
Period.  There is no other definition of RAID 10.  All of Neil's
"layouts" that do not meet the above description _are not RAID 10_ no
matter what he, or anyone else, decided to call them!!

Travel through your time machine back to 1995 to 2000 go into the BIOS
firmware menu of a Mylex, AMI, Adaptec, or DPT PCI RAID controller.
They all say RAID 10, and they all used the same "layout", which is
hardware sector mirroring of two disks and striping filesystem blocks
across those mirrored pairs.

/end RAID 10 nomenclature rant


Reply to: