Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie

To: Martin Steigerwald <Martin@lichtvoll.de>
Cc: Martin Steigerwald <ms@teamix.de>, debian-kernel@lists.debian.org, 775014@bugs.debian.org
Subject: Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie
From: Leopold Palomo-Avellaneda <leo@alaxarxa.net>
Date: Mon, 12 Jan 2015 10:06:49 +0100
Message-id: <[🔎] 3577438.ouSAWVJuoE@soho>
In-reply-to: <[🔎] 2437052.MKBpj7CHMH@merkaba>
References: <[🔎] 20150110005950.5814.84719.reportbug@indiana.alaxarxa.net> <[🔎] 1604058.BRp0smki7d@indiana> <[🔎] 2437052.MKBpj7CHMH@merkaba>

El Diumenge, 11 de gener de 2015, a les 10:17:29, Martin Steigerwald va 
escriure:
> As I am interested in NFS performance issues due to my work I copied my work
> address in.

Me too.

> Am Sonntag, 11. Januar 2015, 01:16:03 schrieben Sie:
> > El Dissabte, 10 de gener de 2015, a les 19:30:12, Martin Steigerwald va
> > escriure:
> > [...]
> > 
> > > I suggest you upgrade to 3.16 bpo kernel. Maybe that already makes a
> > > difference. And additionally there is greater chance you get security
> > > updates on that one, cause AFAIK older bpo kernels are not maintained
> > > anymore.
> > 
> > It's in one of the main server of my institute and it's not easy. Anyway,
> > I
> > have programed it to do it soon. Thanks for the suggestion.
> > 
> > [...]
> > 
> > > > cami:/recursos /home/recursos nfs4
> > > > rw,sync,nodev,noatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,soft
> > > > ,p
> > > > ro
> > > > t
> > > > o=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.8,local_
> > > > lo
> > > > ck
> > > > =n one,addr=192.168.1.254 0 0
> > > 
> > > What is this on the wheezy machine?
> > 
> > all the servers are wheezy. The clients are Jessie.
> > 
> > > Look in grep nfs /proc/mounts or nfsstat -m
> > > 
> > > I suggest not to manually tune this. Current kernels use good default
> > > values. I have seen rsize and wsize of up to 1048576.
> > > 
> > > Additionally are they all the same hardware, same network cards? If not,
> > > what is the fast system using and what is the slow system using?
> > 
> > well, I have worked on this issue and i have found some conclusions:
> > 
> > - now, with modern kernels (and nfs4) it has no sense to set rsize and
> > wsize. The negotiation between client and server do the best one. So, all
> > the pages in the net are outdated.
> 
> Hehe, my own training slides where outdated as well – for years. We found
> out about it on one of my Linux performance analysis & tuning trainings as
> participants of the training measures NFS performance with dd with and
> without tuning and it was actually better without tuning. cat /proc/mounts
> – /etc/mtab wasn´t a symlink to it back then – revealed the difference.
> Thats where I found that 1048576 value for both rsize and wsize, instead of
> the 8192 or 32768 I recommended to set.
> 
> > - the sync parameter works totally different (client side) in a 3.2 kernel
> > than 3.16 (also 3.12 or 3.8) . I'm talking for a similar hardware
> > (Gigabyte
> > NIC, similar cpu, etc) but on a 100 Mb network) from ~7MB/s (kernel 3.2 vs
> > 63,8 kB/s >3.8). In another environment with a Giga network the rates are
> > (
> > ~145 kB/s vs ~70,3 MB/s)
> 
> Interesting. For all I know "sync" should be quite okay with NFSv3 and
> upwards. With NFSv2 it was very slow.

Now no. With nfs4 and modern kernels, at least works worst.

> > - no significant differences in with wsize. At least that I had found. But
> > the default autonegotiation works very well.
> > 
> > - I still don't understand why, although I have a good hardware in the
> > work, when my clients do:
> > 
> > $ dd if=/dev/zero of=myrandom bs=4096 count=70000
> > 
> > obtains about ~70MB/s
> > 
> > and if I execute the same in the server I obtain 167 MB/s. About 2.5 time
> > slower
> 
> Do you mean a local dd versus a dd on the client on the NFS mount?

yes. I mean that the users have the home shared by nfs. So, if I login in a 
client, and I do:

dd if=/dev/zero of=myrandom bs=4096 count=70000

I obtain ~70MB/s

if I do an ssh to the nfs server (hds then are locally) and I do the same then 
I obtain about 167MB/s.

I have a gigabyte network, with a server with four nics with bounding. iperf 
show me transfers about 937 Mbits/sec (client). However, I cannot affirm that 
my network is perfect. ifconfig shows me a lot of dropped packages.


> Also note that dd may not be the best measurement tool depending on your
> workload and that you will have caching effects with dd unless you use
> oflag=direct, which I never tested with NFS mounts, or at least to correct
> the measured time with conv=fsync (I think).

well with this parameter dd fall down to 79,8 kB/s :-( . However, some more 
sophisticated tool, like iozone should be used to make a trusty information.

 
> There are some nice slides from Greg Banks from SGI about NFS performance. I
> can dig out the URL to it when they are still available online next week at
> work. They are from 2008, but were much more up to date than many other
> things I found on the net. Most of it is not only outdated, but plain wrong
> meanwhile.

I feeling is that there's a lot of back magic around this. And all is about 
trial and error method.

> > I still consider this issue important, I think that a lot of people that
> > upgrade to Jessie with nfs mounts will found some problem. but this is
> > just
> > MHO.
> 
> I wonder tough whether there is a high probability for a "fix" for wheezy,
> as jessie is shortly before release and its not a bug in itself, but a
> performance issue. And: It may be fixed by just upgrading the kernel to the
> 3.16 bpo one.

is someone (like me) has a environtment with a lot of machines, with a some 
network parameters tried specifically (sync, for instance) and do an upgrade, 
will make their environtment not too much usable.



-- 
--
Linux User 152692     GPG: 05F4A7A949A2D9AA
Catalonia
-------------------------------------
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply to:

References:
- Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie
  - From: Leopold Palomo-Avellaneda <leo@alaxarxa.net>
- Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie
  - From: Leopold Palomo-Avellaneda <leo@alaxarxa.net>
- Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie
  - From: Martin Steigerwald <Martin@lichtvoll.de>

Prev by Date: Bug#775178: linux-image-3.2.0-4-amd64: linux-image-3.2.0-4-amd64: regression: suspend/resume not working after update (again)
Next by Date: Bug#775178: linux-image-3.2.0-4-amd64: linux-image-3.2.0-4-amd64: regression: suspend/resume not working after update (again)
Previous by thread: Re: Bug#775014: nfs-common: Degraded performance on nfs4 clients after upgrade to Jessie
Next by thread: Bug#775023: linux-image-3.2.0-4-amd64: Logitech Unifying receiver not recognized correctly
Index(es):
- Date
- Thread