[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie



As I am interested in NFS performance issues due to my work I copied my work 
address in.

Am Sonntag, 11. Januar 2015, 01:16:03 schrieben Sie:
> El Dissabte, 10 de gener de 2015, a les 19:30:12, Martin Steigerwald va
> escriure:
> [...]
> 
> > I suggest you upgrade to 3.16 bpo kernel. Maybe that already makes a
> > difference. And additionally there is greater chance you get security
> > updates on that one, cause AFAIK older bpo kernels are not maintained
> > anymore.
> 
> It's in one of the main server of my institute and it's not easy. Anyway, I
> have programed it to do it soon. Thanks for the suggestion.
> 
> [...]
> 
> > > cami:/recursos /home/recursos nfs4
> > > rw,sync,nodev,noatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,soft,p
> > > ro
> > > t
> > > o=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.8,local_lo
> > > ck
> > > =n one,addr=192.168.1.254 0 0
> > 
> > What is this on the wheezy machine?
> 
> all the servers are wheezy. The clients are Jessie.
> 
> > Look in grep nfs /proc/mounts or nfsstat -m
> > 
> > I suggest not to manually tune this. Current kernels use good default
> > values. I have seen rsize and wsize of up to 1048576.
> > 
> > Additionally are they all the same hardware, same network cards? If not,
> > what is the fast system using and what is the slow system using?
> 
> well, I have worked on this issue and i have found some conclusions:
> 
> - now, with modern kernels (and nfs4) it has no sense to set rsize and
> wsize. The negotiation between client and server do the best one. So, all
> the pages in the net are outdated.

Hehe, my own training slides where outdated as well – for years. We found out 
about it on one of my Linux performance analysis & tuning trainings as 
participants of the training measures NFS performance with dd with and without 
tuning and it was actually better without tuning. cat /proc/mounts – /etc/mtab 
wasn´t a symlink to it back then – revealed the difference. Thats where I found 
that 1048576 value for both rsize and wsize, instead of the 8192 or 32768 I 
recommended to set.

> - the sync parameter works totally different (client side) in a 3.2 kernel
> than 3.16 (also 3.12 or 3.8) . I'm talking for a similar hardware (Gigabyte
> NIC, similar cpu, etc) but on a 100 Mb network) from ~7MB/s (kernel 3.2 vs
> 63,8 kB/s >3.8). In another environment with a Giga network the rates are (
> ~145 kB/s vs ~70,3 MB/s)

Interesting. For all I know "sync" should be quite okay with NFSv3 and 
upwards. With NFSv2 it was very slow.

> - no significant differences in with wsize. At least that I had found. But
> the default autonegotiation works very well.
> 
> - I still don't understand why, although I have a good hardware in the work,
> when my clients do:
> 
> $ dd if=/dev/zero of=myrandom bs=4096 count=70000
> 
> obtains about ~70MB/s
> 
> and if I execute the same in the server I obtain 167 MB/s. About 2.5 time
> slower

Do you mean a local dd versus a dd on the client on the NFS mount?

Also note that dd may not be the best measurement tool depending on your 
workload and that you will have caching effects with dd unless you use 
oflag=direct, which I never tested with NFS mounts, or at least to correct the 
measured time with conv=fsync (I think).

There are some nice slides from Greg Banks from SGI about NFS performance. I 
can dig out the URL to it when they are still available online next week at 
work. They are from 2008, but were much more up to date than many other things 
I found on the net. Most of it is not only outdated, but plain wrong 
meanwhile.

> I still consider this issue important, I think that a lot of people that 
> upgrade to Jessie with nfs mounts will found some problem. but this is just 
> MHO.

I wonder tough whether there is a high probability for a "fix" for wheezy, as 
jessie is shortly before release and its not a bug in itself, but a 
performance issue. And: It may be fixed by just upgrading the kernel to the 
3.16 bpo one.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: