[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Стратегия поддержания резервных копий. Деградация носителей.



On 2017-06-23, Oleksandr Gavenko wrote:

> Тут писали про CD/DVD, флешки, диски.

https://en.wikipedia.org/wiki/M-DISC

  Millenniata testing found that M-Disc DVDs are more durable than
  conventional DVDs. "The discs were subject to the following test conditions
  in the environmental chamber: 85°C, 85% relative humidity (conditions
  specified in ECMA-379) and full spectrum light".[9][10] But according to a
  test of the French National Laboratory of Metrology and Testing at 90 °C and
  85% humidity, for 1,000 hours, the DVD+R with inorganic recording layer such
  as M-DISC showed similar deterioration in quality as other conventional
  discs with an organic recording layer, with a maximum lifetime of below 250
  hours.[11]

Хоть и рекламируют 1000 лет не все так радужно даже для любителей лазерных
дисков. В отличии от лент, повторное использование невозможно.

Т.е. нужно записывать с кодом корекции и периодически проверять и например раз
в 10 лет переписывать на новый носитель.

> Деградация носителей - обычный процес.

Еще почитал, таки есть понятия:

* https://en.wikipedia.org/wiki/Data_degradation

* https://en.wikipedia.org/wiki/Data_scrubbing

  error correction technique that uses a background task to periodically
  inspect main memory or storage for errors, then correct detected errors
  using redundant data in the form of different checksums or copies of data.
  Data scrubbing reduces the likelihood that single correctable errors will
  accumulate, leading to reduced risks of uncorrectable errors.

По заявлениям Wikipedia md, Btrfs, ZFS предоставляют on demand проверку
протухших бит. Как например:

https://raid.wiki.kernel.org/index.php/Scrubbing
  A RAID array can suffer from sleeping bad blocks. i.e. blocks that you
  cannot read, but normally you never do (because they haven't been allocated
  to a file yet). When a drive fails, and you are recovering the data onto a
  spare, hitting that sleeper can kill your array. For this reason it is good
  to regularly (daily, or weekly, maybe monthly) read through the entire array
  making sure everything is OK.

  echo check > /sys/block/mdX/md/sync_action
  echo repair > /sys/block/mdX/md/sync_action

Для борьбы с протуханием на дисковом массиве необходимо не только
избыточность, но и контрольные суммы:

  https://unix.stackexchange.com/questions/105337/bit-rot-detection-and-correction-with-mdadm

Ведь если в RAID5 будет bit flip или в RAID 6 disk failure + bit flip, то
выяснить на каком диске произошел сбой - не будет возможн и что бы
проголосовать за правильный набор дисков нужна контрольная сумма.

В случае ZFS, Btrfs скрабинг будет выполняться по используемым блокам, что
эффективней чем в случае md сканировать весь массив. И с md нет контрольных
сумм что бы выяснить какой диск обманывает.

https://unix.stackexchange.com/questions/137384/raid6-scrubbing-mismatch-repair
  It is possible in theory: the data+parity gives you three opinions on what
  the data should be; if two of them are consistent, you can assume the third
  is the incorrect one and re-write it based on the first two.

  Linux RAID6 does not do this. Instead, any time there is a mismatch, the two
  parity values are assumed to be incorrect and recalculated from the data
  values. There have been proposals to change to a "majority vote" system, but
  it hasn't been implemented.

  The mdadm package includes the raid6check utility that attempts to figure
  out which disk is bad in the event of a parity mismatch, but it has some
  rough edges, is not installed by default, and doesn't fix the errors it
  finds.

https://serverfault.com/questions/391922/linux-mdadm-software-raid-6-does-it-support-bit-corruption-recovery
  Linux software RAID is not going to protect you from bit corruption and
  silent data corruption is a well known issue with it. In fact, if the kernel
  is able to read the data from one disk it would never know that it is bad.
  The RAID only kicks in if there is an I/O error when reading the data.

  If you are worried about data integrity you should consider using a file
  system like Btrfs or ZFS that ensure data integrity by storing and verifying
  checksums. These file systems also take care of the RAID functionality, so
  you don't need the kernel software raid if you go that way.

Так что md в RAID6 поступает паскудно.

Итого размными доступными решениями являються только Btrfs + ZFS.

Т.е. RAID функциональность нужно встраивать в FS для эффективной борьбы с
протуханием битов. Контрольные суммы на хешах лучше сравнения данных и parity
и отловят double flip`ы и scrubbing будет "давать" 1/2^256 гарантию сохранности
блока.

Также:

https://superuser.com/questions/1131701/btrfs-over-mdadm-raid6
  In 2016, Btrfs RAID-6 should not be used.

  You can see on the Btrfs status page that RAID56 is considered unstable. The
  write hole still exists, and the parity is not checksummed. Scrubbing will
  verify data but not repair any data degradation.

  Btrfs can't repair an inconsistency happening at md's level.

  Snapshots would still work, though, but also without repairs during scrubbing.

https://btrfs.wiki.kernel.org/index.php/Status
  RAID56 -	**Unstable** 
  Scrub + RAID56 -	mostly OK 
  RAID1 -	mostly OK

  RAID56 Some fixes went to 4.12, namely scrub and auto-repair fixes. Feature
  marked as mostly OK for now.

Т.е. выбор не велик ZFS и тчк.

-- 
http://defun.work/


Reply to: