System: 4-way Opteron, generic Debian Sarge AMD64
RAID controller: LSI Logic MegaRAID 320-1, 64MB cache
RAID config: Three 146GB 15K SCSI/320 disks, RAID-5
Kernel: 2.6.14 SMP, includes megaraid driver
The above system is incredibly fast under almost all conditions, except
when writing very large files (say, 100s of MB, or even GB). When
writing such files, the system effectively locks-up for many seconds -
typically, for as long as it takes to finish writing/flushing the file
to disk. This lock-up affects all other processes: local text editor
sessions, workstations with /home NFS-mounted, web server stops serving.
(I guess all the affected processes are those which are contending for
disk write access, actually). In particular, the workstations which
have /home NFS-mounted experience a *workstation* hang (if trying to
write) during the *server* disk flush, which is very frustrating. Given
that a 'write' may simply involve updating a web browser history stored
in /home, this is an extremely serious problem.
Example while system is idle, out of work hours: while creating a 1GB
file (copying an existing file, already cached in RAM), 'vmstat 3' shows
the following:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 4280268 55888 3366484 0 0 0 0 260 49 0 0 100 0
1 1 0 3777432 56364 3847652 0 0 0 6419 319 55 0 23 77 0
1 4 0 3258408 56852 4342476 0 0 0 10243 407 45 0 25 75 0
0 3 0 3070296 57028 4520868 0 0 0 8856 403 99 0 9 75 16
0 4 0 3068152 57044 4520852 0 0 5 9561 417 153 0 1 72 27
0 3 0 3069316 57044 4520852 0 0 0 10240 429 144 0 0 75 25
0 3 0 3069356 57044 4520852 0 0 0 10219 411 85 0 0 75 25
0 3 0 3069368 57044 4520852 0 0 0 8876 391 78 0 0 75 25
[...]
0 2 0 3077856 57044 4520852 0 0 0 9557 409 44 0 0 75 25
0 2 0 3077856 57044 4520852 0 0 0 8875 384 41 0 0 75 25
0 1 0 3097748 57044 4520852 0 0 0 7704 421 42 0 2 73 25
0 0 0 3100096 57048 4520848 0 0 0 56 259 20 0 0 99 1
0 0 0 3100112 57052 4520844 0 0 0 552 362 32 0 0 97 3
0 0 0 3100112 57052 4520844 0 0 0 0 270 63 0 0 100 0
0 0 0 3100384 57052 4520844 0 0 0 5 260 39 0 0 100 0
I see that the 'bo' column, "blocks written to block device" kicks in
and it takes approximately two minutes to finish flushing this file to
disk (which makes a disk write rate of less than 10MB/sec, which strikes
me as very slow). I also see that the CPU IO-Wait column ('wa') shows
25% while this is happening: this corresponds to one of our four CPUs,
meaning that CPU is waiting for the file to flush to disk, presumably.
Once the flush finishes, the disk and CPU state returns to idle.
I have already tried:
- a couple of different kernels. The stock Sarge kernel
2.6.8-11-amd64-k8-smp, and a custom-compiled 2.6.14 kernel. I
configured the custom kernel to use the pre-emptible features designed
for desktop use, in the hope that the other interactive processes
would benefit from this. The kernel doesn't seem to affect the
behaviour I describe above.
Should I expect this kind of performance when writing large files?
If not, then what can be done to improve this kind of write performance?
The RAID controller is currently set to "write-through". I understand
that, in theory, better write performance may be obtained by using
"write-back", although I don't see how that would help for files that
are many times larger than the RAID controller cache (64MB vs. files of
100s of MB). I understand the potential data-loss implications of using
write-back. Thoughts/comments on changing to "write-back" in these
circumstances?
Any other suggestions or reports of similar experiences?
Cheers,
Dave.
--
Dave Ewart
davee@ceu.ox.ac.uk
Computing Manager, Cancer Epidemiology Unit
Cancer Research UK / Oxford University
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
Get key from http://www.ceu.ox.ac.uk/~davee/davee-ceu-ox-ac-uk.asc
N 51.7518, W 1.2016
Attachment:
signature.asc
Description: Digital signature