[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#517122: libc6: very slow access/open/... syscalls on NFS mounted files



Package: libc6
Version: 2.7-18
Severity: normal


I am sorry if I should have originally filed it against kernel but I am not
sure if it is actually an NFS client issue.. It seems to be not nfs server
issue either since other clients of the same server perform fine. It shouldn't
be issue bonded interface on this box, since other nodes in the cluster with
bonded interfaces perform fine.

2 days ago I upgraded our cluster from etch to lenny.

Yesterday I've mentioned (it could be that it wasn't that slow before) that it
takes too long to start bash/zsh. strace showed that syscalls dealing with NFS
mounted files:

13:10:46.301746 access("/home/yoh/.zsh/sourcedir", F_OK) = 0 <0.019635>

whenever on other nodes it takes just 0.0001 or so

I've done obvious checks of nfsstat and netstat -- no obvious reasons for
slow-down.

Here is an excerpt from my notes in tiddlywiki. I would appreciate any hints on
what else to look at, since nfs client traffic seems to be going quite fast,
but there is significant delays between transactions, which probably what boils
down to slow syscalls.


examples from main node
{{{
strace -fF -o /tmp/zsh2.strace -T -ttt zsh-beta

1235583796.011574 access("/home/yoh/.zshenv", F_OK) = 0 <0.023661>
1235583796.035416 munmap(0x7fd834c96000, 2100856) = 0 <0.000053>
1235583796.035562 stat("/home/yoh/.zshenv.zwc", 0x7fff3e262f00) = -1 ENOENT (No such file or directory) <0.009246>
}}}

has corresponding tcpdump:
{{{
 tcpdump -i bond0 -tt -vvv -n "(dst host raider or src host raider)" -s 192 >| /tmp/zsh-beta.tcpdump

1235583796.020630 IP (tos 0x0, ttl 64, id 43841, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.514335490 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.020777 IP (tos 0x0, ttl 64, id 46892, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.514335490: reply ok 124 access attr: DIR 40755 ids 47521/47522 sz 16384 nlink 221 rdev 0/0 fsid 7d6310991d5a808d fileid 51855823 a/m/ctime 1235582942.843330957 1235583781.787331514 1235583781.787331514 c 001f
1235583796.020857 IP (tos 0x0, ttl 64, id 43842, offset 0, flags [DF], proto TCP (6), length 232) 10.0.0.1.531112706 > 10.0.0.252.2049: 180 lookup [|nfs]
1235583796.021001 IP (tos 0x0, ttl 64, id 46893, offset 0, flags [DF], proto TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.531112706: reply ok 240 lookup fh Unknown/010006013CC4464909BF4398B1441C5490AF20E507AF8051ED1A000000000001 LNK 120777 ids 47521/47522 sz 16 [|nfs]
1235583796.025559 IP (tos 0x0, ttl 64, id 43843, offset 0, flags [DF], proto TCP (6), length 228) 10.0.0.1.547889922 > 10.0.0.252.2049: 176 lookup [|nfs]
1235583796.025709 IP (tos 0x0, ttl 64, id 46894, offset 0, flags [DF], proto TCP (6), length 292) 10.0.0.252.2049 > 10.0.0.1.547889922: reply ok 240 lookup fh Unknown/010006013CC4464909BF4398B1441C5490AF20E513906E780700000000000001 DIR 40700 ids 47521/47522 sz 90 [|nfs]
1235583796.030339 IP (tos 0x0, ttl 64, id 43844, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.564667138 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.030489 IP (tos 0x0, ttl 64, id 46895, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.564667138: reply ok 124 access attr: DIR 40700 ids 47521/47522 sz 90 nlink 10 rdev 0/0 fsid 7d6310991d5a808d fileid 786e9013 a/m/ctime 1231731824.355110521 1231737665.659793897 1231737665.659793897 c 001f
1235583796.035048 IP (tos 0x0, ttl 64, id 43845, offset 0, flags [DF], proto TCP (6), length 224) 10.0.0.1.581444354 > 10.0.0.252.2049: 172 access [|nfs]
1235583796.035191 IP (tos 0x0, ttl 64, id 46896, offset 0, flags [DF], proto TCP (6), length 176) 10.0.0.252.2049 > 10.0.0.1.581444354: reply ok 124 access attr: DIR 40775 ids 47521/47522 sz 66 nlink 6 rdev 0/0 fsid 7d6310991d5a808d fileid 18005733 a/m/ctime 1218207520.461796280 1218207523.890125418 1218207523.890125418 c 001f
1235583796.044623 IP (tos 0x0, ttl 64, id 43846, offset 0, flags [DF], proto TCP (6), length 236) 10.0.0.1.598221570 > 10.0.0.252.2049: 184 lookup [|nfs]
1235583796.044773 IP (tos 0x0, ttl 64, id 46897, offset 0, flags [DF], proto TCP (6), length 172) 10.0.0.252.2049 > 10.0.0.1.598221570: reply ok 120 lookup ERROR: No such file or directory post dattr: DIR 40755 ids 47521/47522 sz 16384 nlink 221 rdev 0/0 fsid 7d6310991d5a808d fileid 51855823 a/m/ctime 1235582942.843330957 1235583781.787331514 1235583781.787331514
}}}

so nfs queries processed ok -- around 0.0001 sec (like on other nodes) but dt between transactions is high (up to 0.005sec), which leads to total 0.02 for a syscall... too high.


on another example used NFS client debugging:
{{{
13:10:45.714835 access("/home/yoh/.zshenv", F_OK) = 0 <0.018355> 
13:10:45.733420 munmap(0x7f789cc36000, 2100856) = 0 <0.000063>
13:10:45.733623 stat("/home/yoh/.zshenv.zwc", 0x7fffa6202ea0) = -1 ENOENT (No such file or directory) <0.006971
}}}

{{{

# used sudo rpcdebug -v -m nfs -s all to enable and -c to disable debugging

Feb 25 13:10:45 ravana kernel: [222679.726216] NFS: nfs_lookup_revalidate(yoh/.zshenv) is valid
Feb 25 13:10:45 ravana kernel: [222679.729475] NFS: permission(0:18/1367693347), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.729495] NFS call  lookup .etc
Feb 25 13:10:45 ravana kernel: [222679.729827] NFS: nfs_update_inode(0:18/1367693347 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729835] NFS reply lookup: 0
Feb 25 13:10:45 ravana kernel: [222679.729842] NFS: nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS: nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call  access
Feb 25 13:10:45 ravana kernel: [222679.733563] NFS: nfs_update_inode(0:18/2020511763 ct=1 info=0x6)
Feb 25 13:10:45 ravana kernel: [222679.733572] NFS reply access: 0
Feb 25 13:10:45 ravana kernel: [222679.733579] NFS: permission(0:18/2020511763), mask=0x1, res=0
Feb 25 13:10:45 ravana kernel: [222679.733592] NFS: nfs_lookup_revalidate(.etc/zsh) is valid
}}}

so major delay is between
{{{
Feb 25 13:10:45 ravana kernel: [222679.729852] NFS: nfs_lookup_revalidate(yoh/.etc) is valid
Feb 25 13:10:45 ravana kernel: [222679.733126] NFS call  access
}}}

-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1                      1:4.3.2-1.1 GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  glibc-doc                     2.7-18     GNU C Library: Documentation
ii  locales                       2.7-18     GNU C Library: National Language (

-- debconf information:
* glibc/upgrade: true
  glibc/restart-failed:
* glibc/restart-services: ssh snmpd rsync openbsd-inetd nis exim4 cupsys cron atd apache-ssl



Reply to: