[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: LVM write performance



On 08/09/2011 07:13 PM, Stan Hoeppner wrote:
> On 8/9/2011 9:12 AM, Dion Kant wrote:
>
>> Thanks for your remarks. The disk info is given below. Writing to the
>> disk is oke when mounted, so I think it is not a hardware/alignment
>> issue.  However your remarks made me do some additional investigations:
>>
>> 1. dd of=/dev/sdb4 if=/dev/zero gives similar results, so it has nothing
>> to do with LVM;
>> 2. My statement about writing like this on an openSUSE kernel is wrong.
>> Also with openSUSE and the same hardware I get similar (slow) results
>> when writing to the disk using dd via the device file.
>>
>> So now the issue has diverted to the asymmetric behaviour when
>> writing/reading using dd directly through the (block) device file.
>>
>> Reading with dd if=/dev/sdb4 of=/dev/null gives disk limited performance
>> Writing with dd of=/dev/sdb4 if=/dev/zero gives about a factor 10 less
>> performance.
> Run:
> /$ dd of=/dev/sdb4 if=/dev/zero bs=4096 count=500000
>
> Then run again with bs=512 count=2000000
>
> That will write 2GB in 4KB blocks and will prevent dd from trying to
> buffer everything before writing it.  You don't break out of this--it
> finishes on it's own due to 'count'.  The second run with use a block
> size of 512B, which is the native sector size of the Seagate disk.
> Either of these should improve your actual dd performance dramatically.
>
> When you don't specify a block size with dd, dd attempts to "buffer" the
> entire input stream, or huge portions of it, into memory before writing
> it out.  If you look at RAM, swap usage, and disk IO while running your
> 'raw' dd test, you'll likely see both memory, and IO to the swap device,
> are saturated, with little actual data being written to the target disk
> partition.
>
> I attempted to nudge you into finding this information on your own, but
> you apparently did not.  I explained all of this not long ago, either
> here or on the linux-raid list.  It should be in Google somewhere.
> Never use dd without specifying the proper block size of the target
> device--never.  For a Linux filesystem this will be 4096 and for a raw
> hard disk device it will be 512, optimally anyway.  Other values may
> give better performance, depending on the system, the disk controller,
> and device driver, etc.
>
> That Seagate isn't an AF model so sector alignment isn't the issue here,
> just improper use of dd.right. 
>
Stan,

You are right, with bs=4096 the write performance improves
significantly. From the man page of dd I concluded that not specifying
bs selects ibs=512 and obs=512. A bs=512 gives indeed similar
performance as not specifying bs at all.

When observing the system with vmstat I see the same (strange) behaviour
for no bs specified, or bs=512:

root@dom0-2:~# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 1  0      0 6314620 125988  91612    0    0     0     3    5    5  0  0
100  0
 1  1      0 6265404 173744  91444    0    0 23868    13 18020 12290  0 
0 86 14
 2  1      0 6214576 223076  91704    0    0 24666     1 18596 12417  0 
0 90 10
 0  1      0 6163004 273172  91448    0    0 25046     0 18867 12614  0 
0 89 11
 1  0      0 6111308 323252  91592    0    0 25042     0 18861 12608  0 
0 92  8
 0  1      0 6059860 373220  91648    0    0 24984     0 18821 12578  0 
0 85 14
 0  1      0 6008164 423304  91508    0    0 25040     0 18863 12611  0 
0 95  5
 2  1      0 5956344 473468  91604    0    0 25084     0 18953 12630  0 
0 95  5
 0  1      0 5904896 523548  91532    0    0 25038     0 18867 12607  0 
0 87 13
 0  1      0 5896068 528680  91520    0    0  2558 99597 2431 1373  0  0
92  8
 0  2      0 5896088 528688  91520    0    0     0 73736  535  100  0  0
86 13
 0  1      0 5896128 528688  91520    0    0     0 73729  545   99  0  0
88 12
 1  0      0 6413920  28712  91612    0    0    54  2996  634  372  0  0
95  4
 0  0      0 6413940  28712  91520    0    0     0     0   78   80  0  0
100  0
 0  0      0 6413940  28712  91520    0    0     0     0   94   97  0  0
100  0

Remarkable behaviour in the sense that there is a lot of bi in the
beginning and finally I see bo at 75 MB/s.

With obs=4096 it looks like

root@dom0-2:~# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 1  0      0 6413600  28744  91540    0    0     0     3    5    5  0  0
100  0
 1  0      0 6413724  28744  91540    0    0     0     0  103   96  0  0
100  0
 1  0      0 6121616 312880  91208    0    0     0    18  457  133  1  2
97  0
 0  1      0 5895588 528756  91540    0    0     0 83216  587   88  1  3
90  6
 0  1      0 5895456 528756  91540    0    0     0 73728  539   98  0  0
92  8
 0  3      0 5895400 528760  91536    0    0     0 73735  535   93  0  0
86 14
 1  0      0 6413520  28788  91436    0    0    54 19359  783  376  0  0
93  6
 0  0      0 6413544  28788  91540    0    0     0     2  100   84  0  0
100  0
 0  0      0 6413544  28788  91540    0    0     0     0   86   87  0  0
100  0
 0  0      0 6413552  28796  91532    0    0     0    10  110  113  0  0
100  0

As soon as I select a bs which is not a whole multiple of 4096, I get a
lot of block input and a bad performance for writing data to disk.

I'll try to Google your mentioned thread(s) on this. I still feel not
very satisfied with your explanation though.

Thanks so far,

Dion


Reply to: