[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#517122: marked as done (linux-2.6: very slow access/open/... syscalls on NFS mounted files)



Your message dated Tue, 4 Jun 2013 00:30:38 +0200
with message-id <20130603223038.GE5604@pisco.westfalen.local>
and subject line Closing
has caused the Debian Bug report #517122,
regarding linux-2.6: very slow access/open/... syscalls on NFS mounted files
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
517122: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=517122
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libc6
Version: 2.7-18
Severity: normal


I am sorry if I should have originally filed it against kernel but I am not
sure if it is actually an NFS client issue.. It seems to be not nfs server
issue either since other clients of the same server perform fine. It shouldn't
be issue bonded interface on this box, since other nodes in the cluster with
bonded interfaces perform fine.

2 days ago I upgraded our cluster from etch to lenny.

Yesterday I've mentioned (it could be that it wasn't that slow before) that it
takes too long to start bash/zsh. strace showed that syscalls dealing with NFS
mounted files:

13:10:46.301746 access("/home/yoh/.zsh/sourcedir", F_OK) = 0 <0.019635>

whenever on other nodes it takes just 0.0001 or so

I've done obvious checks of nfsstat and netstat -- no obvious reasons for
slow-down.

Here is an excerpt from my notes in tiddlywiki. I would appreciate any hints on
what else to look at, since nfs client traffic seems to be going quite fast,
but there is significant delays between transactions, which probably what boils
down to slow syscalls.


examples from main node
{{{
strace -fF -o /tmp/zsh2.strace -T -ttt zsh-beta

1235583796.011574 access("/home/yoh/.zshenv", F_OK) = 0 <0.023661>
1235583796.035416 munmap(0x7fd834c96000, 2100856) = 0 <0.000053>
1235583796.035562 stat("/home/yoh/.zshenv.zwc", 0x7fff3e262f00) = -1 ENOENT (No such file or directory) <0.009246>
}}}

has corresponding tcpdump:
{{{
 tcpdump -i bond0 -tt -vvv -n "(dst host raider or src host raider)" -s 192 >| /tmp/zsh-beta.tcpdump

1235583796.020630 IP (tos 0x0, ttl 64, id 43841, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.514335490 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.020777 IP (tos 0x0, ttl 64, id 46892, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.514335490: reply ok 124 access attr: DIR 40755 ids 47521/47522 sz 16384 nlink 221 rdev 0/0 fsid 7d6310991d5a808d fileid 51855823 a/m/ctime 1235582942.843330957 1235583781.787331514 1235583781.787331514 c 001f
1235583796.020857 IP (tos 0x0, ttl 64, id 43842, offset 0, flags [DF], proto TCP (6), length 232) 10.0.0.1.531112706 > 10.0.0.252.2049: 180 lookup [|nfs]
1235583796.021001 IP (tos 0x0, ttl 64, id 46893, offset 0, flags [DF], proto TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.531112706: reply ok 240 lookup fh Unknown/010006013CC4464909BF4398B1441C5490AF20E507AF8051ED1A000000000001 LNK 120777 ids 47521/47522 sz 16 [|nfs]
1235583796.025559 IP (tos 0x0, ttl 64, id 43843, offset 0, flags [DF], proto TCP (6), length 228) 10.0.0.1.547889922 > 10.0.0.252.2049: 176 lookup [|nfs]
1235583796.025709 IP (tos 0x0, ttl 64, id 46894, offset 0, flags [DF], proto TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.547889922: reply ok 240 lookup fh Unknown/010006013CC4464909BF4398B1441C5490AF20E513906E780700000000000001 DIR 40700 ids 47521/47522 sz 90 [|nfs]
1235583796.030339 IP (tos 0x0, ttl 64, id 43844, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.564667138 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.030489 IP (tos 0x0, ttl 64, id 46895, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.564667138: reply ok 124 access attr: DIR 40700 ids 47521/47522 sz 90 nlink 10 rdev 0/0 fsid 7d6310991d5a808d fileid 786e9013 a/m/ctime 1231731824.355110521 1231737665.659793897 1231737665.659793897 c 001f
1235583796.035048 IP (tos 0x0, ttl 64, id 43845, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.581444354 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.035191 IP (tos 0x0, ttl 64, id 46896, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.581444354: reply ok 124 access attr: DIR 40775 ids 47521/47522 sz 66 nlink 6 rdev 0/0 fsid 7d6310991d5a808d fileid 18005733 a/m/ctime 1218207520.461796280 1218207523.890125418 1218207523.890125418 c 001f
1235583796.044623 IP (tos 0x0, ttl 64, id 43846, offset 0, flags [DF], proto TCP (6), length 236) 10.0.0.1.598221570 > 10.0.0.252.2049: 184 lookup [|nfs]
1235583796.044773 IP (tos 0x0, ttl 64, id 46897, offset 0, flags [DF], proto TCP (6), length 172) 10.0.0.252.2049 > 10.0.0.1.598221570: reply ok 120 lookup ERROR: No such file or directory post dattr: DIR 40755 ids 47521/47522 sz 16384 nlink 221 rdev 0/0 fsid 7d6310991d5a808d fileid 51855823 a/m/ctime 1235582942.843330957 1235583781.787331514 1235583781.787331514
}}}

so nfs queries processed ok -- around 0.0001 sec (like on other nodes) but dt between transactions is high (up to 0.005sec), which leads to total 0.02 for a syscall... too high.


on another example used NFS client debugging:
{{{
13:10:45.714835 access("/home/yoh/.zshenv", F_OK) = 0 <0.018355> 
13:10:45.733420 munmap(0x7f789cc36000, 2100856) = 0 <0.000063>
13:10:45.733623 stat("/home/yoh/.zshenv.zwc", 0x7fffa6202ea0) = -1 ENOENT (No such file or directory) <0.006971
}}}

{{{

# used sudo rpcdebug -v -m nfs -s all to enable and -c to disable debugging

Feb 25 13:10:45 ravana kernel: [222679.726216] NFS: nfs_lookup_revalidate(yoh/.zshenv) is valid
Feb 25 13:10:45 ravana kernel: [222679.729475] NFS: permission(0:18/1367693347), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.729495] NFS call  lookup .etc
Feb 25 13:10:45 ravana kernel: [222679.729827] NFS: nfs_update_inode(0:18/1367693347 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729835] NFS reply lookup: 0
Feb 25 13:10:45 ravana kernel: [222679.729842] NFS: nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS: nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call  access
Feb 25 13:10:45 ravana kernel: [222679.733563] NFS: nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.733572] NFS reply access: 0
Feb 25 13:10:45 ravana kernel: [222679.733579] NFS: permission(0:18/2020511763), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.733592] NFS: nfs_lookup_revalidate(.etc/zsh) is valid
}}}

so major delay is between
{{{
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS: nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call  access
}}}

-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                      1:4.3.2-1.1 GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  glibc-doc                     2.7-18     GNU C Library: Documentation
ii  locales                       2.7-18     GNU C Library: National Language (

-- debconf information:
* glibc/upgrade: true
  glibc/restart-failed:
* glibc/restart-services: ssh snmpd rsync openbsd-inetd nis exim4 cupsys cron atd apache-ssl



--- End Message ---
--- Begin Message ---
Hi,
your bug has been filed against the "linux-2.6" source package and was filed for
a kernel older than the recently released Debian 7.0 / Wheezy with a severity
less than important.

We don't have the ressources to reproduce the complete backlog of all older kernel
bugs, so we're closing this bug for now. If you can reproduce the bug with Debian Wheezy
or a more recent kernel from testing or unstable, please reopen the bug by sending
a mail to control@bugs.debian.org with the following three commands included in the
mail:

reopen BUGNUMBER
reassign BUGNUMBER src:linux
thanks

Cheers,
        Moritz

--- End Message ---

Reply to: