[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Homebuilt NAS Advice



On 8/7/2020 6:23 PM, David Christensen wrote:
??Filesystem?????????? Size?? Used Avail Use% Mounted on

??/dev/md0???????????????? 28T???? 22T?? 6.0T?? 79% /RAID
??Backup:/Backup???? 44T???? 44T?? 512K 100% /Backup

The NAS array is 8 @ 5 TB live drives and 1 @ 5 TB hot spare?

	It was.  I was in the process of upgrading.  Now it is 8 x 8 + 8

The backup system array is 8 @ 8 TB data drives and 1 @ 8 TB hot spare?

Yep. I always upgrade the backup before I upgrade the main array. Well, wait a second. To be clear, that is 6 x 8T of data, plus 2 x 8T of parity, plus 1 x 8T of spare.

No LVM?

No. I don't feel a need for LVM on the data arrays. I use the entire, unpartitioned drive for /RAID.
AIUI you are running desktop motherboards without ECC memory and XFS does not protect against bit rot.?? Are you concerned?

Yes. I have routines that compare the data on the main array and the backup array via checksum. When needed, the backups supply a third vote. The odds of two bits flipping at the very same spot are astronomically low. There has been some bit rot, but so far it has been manageable.

I agree that the 79% usage on the NAS array means action is required.

	Uh-huh.

As I understand md RAID6, the only way to add capacity is to backup, rebuild the array with additional and/or larger drives, and restore (?).

	No, not at all.  To add a drive:

`mdadm /dev/md0 --add /dev/sdX`
`mdadm -v  /dev/md0 --grow --raid-devices=Y`

Note if an internal bitmap is set, it must be removed prior to growing the array. It can be added back once the grow operation is complete.

To increase the drive size, replace any smaller drives with larger drives one at a time:

`mdadm /dev/md0 --add /dev/sdX`
`mdadm /dev/md0 --fail /dev/sdY`

Once all the drives are larger than the current device size used by the array:

`mdadm /dev/md0 --grow`

This will set the device size based upon the smallest device in the array. The device size can be set to a smaller value using the -z parameter. Once the array is grown, the filesystem needs to be expanded via the tool used for that purpose for the given file system.


Are you concerned about 100% usage on the backup server array?

Some, yes. I am going to fix it by removing some very large but unnecessary files. It has only been at 100% for a few days.

 > plus several T of additional files I don't need on the main server.

44 TB total - 22 TB backup = 22 TB additional.?? That explains the 100% usage.

Actually, no. There are not two backup copies on the file system. Believe it or not, there are 22T of files from other sources.


Have you considered putting the additional files on another server that is not backed up, only archived?

They should no longer be needed. Once I confirm that (in a few minutes from now, actually), they will be deleted. If any of the files in question turn out to be necessary, I will do that very thing.

On 2020-08-06 18:58, Leslie Rhorer wrote:
 > The servers have 10G optical links between them.?? A full backup to the
 > RAID 6 array takes several days.


One 10 Gbps network connection per server?

Yes. I don't have slots for additional NIC boards, and my boards only have one port.

22E+12 bytes in 2.8 days is ~90 MB/s.?? That is a fraction of 4 Gbps and an even smaller fraction of 10 Gbps.?? Have you identified the bottleneck?

	That was a calculated number.  Did I make a mistake?

	...

Oops. That should have been about 15 hours or so. The transfer rate for a large file is close to 4Gbps, which is about the best I would expect from this hardware. It's good enough.
	

 > A full backup to single drives takes
 > 2 weeks, because single drives are limited to about 800Mbps, while the
 > array can gulp down nerly 4Gbps.?? Nightly backups (via rsync) take a
 > few minutes, at most.

800 Mbps network throughput should be ~88 MB/s HDD throughput.?? 2 to 4 TB drives should be faster.?? Have you identified the bottleneck?

It's probbably the internal SATA controller on this old motherboard. I'm not using a high-dollar controller for external drives. Again, since I don't do this sort of thing daily, I am not worried about it. I start the backup and walk away. When I come back, it's done. Differential backups are small, so I only very rarely need a second drive.

44E+12 bytes in 15 days is ~34 MB/s.?? Is this due to a DAR manual workflow that limits you to one or two archive drives per day?

	No, that's about what I get on average transfers to external drives.

Are you using hot-swap for the archive drives??? What make and model rack??? What HBA/RAID card??? Same for hot spares and HBA??? Same for the 16 bay rack, HBA, port replicators?

Yes on the hot swap. I just use a little eSATA docking station attached to an eSATA port on the motherboard. 'Definitely a poor man's solution.

If you have two HDD hot-swap bays, can DAR leap-frog destination media?

I believe it can, yes. A script to handle that should be pretty simple. I have never done so.

??E.g. You insert two archive drives and have DAR begin writing to the first.?? When the first is full, DAR begins writing to the second and notifies you.?? You pull the first drive, insert the third drive, and notify DAR.?? When the second drive is full, DAR begins writing to the third, and notifies you.?? Etc.?

Right. I just use the device ID (rather than the name) to write the files and pause when the drive is full. It should be possible to do it with multiple device ID targets. In fact, I know it would be. The script I use right now pauses and waits for the user to replace the drive and press <Enter>. It would be trivial to have the script continue with a different device ID instead of pausing. Iterating through a list of IDs is hardly any more difficult.

	Hmm.  You have given me an idea.  Thanks!

If you have many HDD hot-swap bays, can DAR write in parallel??? With leap-frog?

No, I don't think so, at least not in general. I suppose one could create a front-end process which divides up the source and passes the individual chunks to multiple DAR processes. A Python script should be able to handle it pretty well.


In my experience, HDD's that are stored for long periods have the bad habit of failing within hours of being put back into service.?? Does this concern you?

No, not really. If a target drive fails during a backup, I can just isolate the existing portion and then start a new backup on the isolate. A failed drive during a restore could be a bitch, but that's pretty unlikely. Something like dd_rescue could be a great help.

What is your data destruction policy?

You mean for live data? I don't have one. Do you mean for the backups? There is no formal one.

One design pattern for ZFS is a pool of striped virtual devices (VDEV), each VDEV being two or more mirrored drives of the same size and type (e.g. SSD, SAS, SATA, etc.).?? Cache, intent log, and spare devices can be added for performance and/or reliability.?? To add capacity, you insert another pair of drives and add them into the pool as a VDEV mirror.?? The top-level file system is automatically resized.?? File systems without size restrictions can use the additional capacity. Performance increases.?? For backup, choices include replication to another pool and mirror tricks (add one drive to each VDEV mirror, allow it to resilver, remove one drive from each mirror in rotation).

Oh, yes. For an enterprise system, ZFS is the top contender, in my book. These are for my own use, and my business is small, however. If I ever get to the point where I have more than 10 employees, I will no doubt switch to ZFS.

Let me put it this way: if a business has the need for a separate IT manager, his filesystem of choice for the file server(s) is pretty much without question ZFS. For a small business or for personal use the learning curve may be a bit more than the non-IT user might want to tackle.

Or not. I certainly would not discourage anyone who wants to take on the challenge.


Reply to: