[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: recommendations for supported, affordable hardware raid controller.



On 2021-01-02 03:24, Andrei POPESCU wrote:

http://www.unixsheikh.com/articles/battle-testing-data-integrity-verification-with-zfs-btrfs-and-mdadm-dm-integrity.html

That looks interesting.  Thanks for the link.  :-)


On 2021-01-02 08:08, Richard Hector wrote:
On 3/01/21 12:24 am, Andrei POPESCU wrote:
In case of data corruption (system crash, power outage, user error,
or even just a HDD "hiccup") plain md without the dm-integrity
layer won't even be able to tell which is the good data and will
overwrite your good data with bad data. Silently.

I've had crashes and power outages and never noticed any problems,
not that that means they won't happen (or even that they haven't
happened). Does a journalling filesystem on top not cover that?

AIUI a journaling filesystem provides a two-step process to achieve atomic writes of multiple sectors to disk -- e.g. a process wants to put some data into a block here (say, a file), a block there (say, a directory), etc., and consistency of the on-disk data structures must be preserved. The journal provides a two-step process whereby everything is written to the journal, then everything is written to disk. If either step is interrupted, the filesystem driver will detect the failure and respond. When done, either all of the blocks have been updated on disk or none of the blocks on disk have been changed.


Integrity checking addresses different failure modes by applying checksums to data blocks and metadata blocks. If the contents of a block become corrupt, either in memory, in transit, on disk, etc., the driver will detect the failure and respond. If redundant data is available, such as via RAID, the driver will correct the data and operations continue. If no redundant data is available, the driver will generate an error. File system layering features in the Linux kernel allow you to add the dm-integrity device mapper layer into a storage stack as desired:

https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-integrity.html


On a related note, it is wise to have ECC memory to protect against data corruption in memory:

http://www.openoid.net/will-zfs-and-non-ecc-ram-kill-your-data/


More failure modes exist (potentially, an infinite number). It's a question of what failure modes and effects concern you, and how much time and money you want to spend to mitigate risks.


David


Reply to: