[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DRBD sync speed



On Thu, Oct 11, 2018 at 08:54:42AM +0100, Adam Weremczuk wrote:
> Hi Dan,
> 
> Yes, I tried tweaking config following that link but for some reason the
> sync progress is not showing any more.
> I guess I need to fiddle with it more.
> 
> I have 16 x 500 GB disks in each server and my layout is as below:
> 
> 1-4: VD0: RAID10: 2 spans of 2 disks -> 1TB for Proxmox containers and VMs
> 5-14: VD1: RAID50: 2 spans of 5 disks -> 4TB for storage (which I'm trying
> to sync for redundancy using DRBD)
> 15-16: global hot spares
> 
> It appears to provide the best performance, resiliency and space utilisation
> balance.
> I've been referring to this chart:
> https://www.datarecovery.net/articles/raid-level-comparison.aspx
> 
> Is there anything fundamentally wrong with my architecture?

It depends on what you value most.

Is it random I/O performance? Sequential read? Sequential write?
Resiliency against disk loss? Uptime? Data safety?

For your VD0, it's a choice between RAID10 and RAID6 (assuming that's
available to you). With RAID10, you get excellent random I/O performance,
good streaming write performance, excellent read performance, and you can
lose up to two disks but only if they're the right ones (A1, A2, B1, B2:
If you lose one from A and one from B you're fine, but if you lose both
As or both Bs, data loss and dead filesystem.) With RAID6, you lose out
on read and write speeds, but can survive the loss of any 2 disks.

For your VD1, you can get excellent sequential read speeds, but write
speed will be comparatively terrible: roughly 25% of the disks' combined
write performance. RAID50 can survive the loss of 6 disks, but only if
they're the right ones: the loss of four disks can kill the system. RAID50
is also very slow to recover from a disk loss, and it is unfortunately
common for the recovery process to kill more disks that were on the edge.

DRBD slows down everything in synch mode (protocol C); you'll be limited
to your available network speed. Do you really have a requirement to
have all your storage be instantly available on the second server at all
times? You didn't mention a clustering filesystem, and you did mention
failover scenarios, so it sounds like you're planning on restarting VMs
from the second server with, hopefully, up-to-the-second recency. But if
writes are frequent, you might overwhelm the DRBD link in synch mode,
and you'll face the choice of low performance all the time or a higher
risk of data loss.

The problem that I see is that, assuming dual or triple power supplies
for each server, your most likely threats are things that will affect
both servers: a power outage, a switch failure, a routing problem,
environmental issues, other things beyond your control.

If you don't actually have a realtime sync requirement, you might be
happier with something like hourly ZFS send/recv jobs. I don't know your
exact scenario.

-dsr-


Reply to: