Re: file systems
On 5/4/2011 6:44 PM, Boyd Stephen Smith Jr. wrote:
In<4DC1E009.email@example.com>, Stan Hoeppner wrote:
On 5/2/2011 4:02 PM, Boyd Stephen Smith Jr. wrote:
They are also essential for any journaled filesystem to have correct
behavior in the face of sudden pwoer loss.
This is true only if you don't have BBWC.
No. It is true even with BBWC.
No, it's not. Sorry I didn't find any Debian documentation to prove my
point. I'll use Red Hat docs:
"For devices with non-volatile, battery-backed write caches and those
with write-caching disabled, you can safely disable write barriers at
mount time using the -o nobarrier option for mount. However, some
devices do not support write barriers; such devices will log an error
message to /var/log/messages (refer to Table 17.1, “Write barrier error
messages per file system”)."
You will see such errors with very high end SAN arrays, as I previously
mentioned. They simply don't support write barriers. Why? Because
constantly flushing an entire 16-64 *GigaByte* battery or flash backed
write cache, sitting in front of 2048 SAS drives, because 64 servers on
the SAN keep issuing barriers at the rate of 10,000/second, is a mind
numbingly dumb thing to do.
"Write barriers are also unnecessary whenever the system uses hardware
RAID controllers with battery-backed write cache. If the system is
equipped with such controllers and if its component drives have write
caches disabled, the controller will advertise itself as a write-through
cache; this will inform the kernel that the write cache data will
survive a power loss."
Even with a a battery-packed RAID cache, like I have in my desktop,
executing without barrier can result in extra data loss that executing
with a barrier prevents.
Then I'd say you have a problem with your BBWC RAID controller in your
desktop. Which BBWC RAID card do you have?
Can you kindly point me to your past posts where you discussed this
'extra data loss' problem you experienced? After AC power loss, with
your Areca-1160 w/ ARC-6120BA-T112 battery unit? I'd like to better
understand the circumstances surrounding the data loss.
Of course, even with out barriers a properly journaled or log-structed
filesystem should be able to immediately and silently recover.
This contradicts what you stated above.
No, it doesn't. The filesystem can recover by dropping or replaying journal /
log entries that were not yet flushed to disk. That doesn't mean you haven't
lost any data, if parts of the journal that existed in cache before the power
The argument you made was that barriers are required to maintain correct
journal write ordering. If that order isn't maintained because barriers
are turned off, then, using your argument, the replaying of the 'out of
order' log journal will likely corrupt the filesystem. You seem to
arguing from both sides of the fence.
With barriers, you a guaranteed to be able to recover to the last barrier.
Without them, the hardware many have fully, partially, of not-at-all completed
virtually any I/O.
This is generally true, but depends on the 'hardware' you're referring
to, as I've pointed out a few times now in this thread.
This is why (good) BBWC enabled RAID cards automatically disable the
caches on all the drives,
Mine provides the option. I can't remember what setting I'm using right now.
IIRC, I continue to use the drives write cache because I have a UPS that
provides enough time for a clean shutdown, even when under load.
Given that you have both the ARC-6120BA-T112 RAID card battery and a
UPS, I'm now really curious to know more about your data loss due to not
and thus why it is recommended to disable
barriers for filesystems on BBWC RAID cards.
By whom? Reference please.
Links and excerpts provided above.
The nobarrier results are far more relevant than the barrier results,
especially the 16 and 128 thread results, for those SAs with high
performance persistent storage.
I disagree entirely. You should be looking at the threaded results,
probably 128 threads (depending on what the server does), but you should
also be using barriers.
You just said you "disagree entirely" and then say 128 threads, same
thing I said. But then you recommend barriers, which is the disagreement.
You said 128 threads unconditionally, I admitted that there are certain
workloads where 16 threads is a more correct model.
The multi-thread tests are simply used to show how each filesystem
scales with parallel workloads. Some servers will never see 16 parallel
IO streams, such as most SOHO servers. Some servers will see thousands
of simultaneous IO streams, such as the Linux kernel archives servers.
There is no "correct model".