[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Problem with SMB mounts and Kernel 2.6.x



Hi!
 
I haven't found a current bug report yet, but before filing one myself,
I would like to confirm that it's not some exotic misconfiguration on my
behalf. I am running several Debian boxes, which access some SMB-shares
on other Linux-servers and one Win2k-server. When receiving a certain
amount of load, sooner or later (mostly the former, i.e. within a few
minutes, sometimes seconds) the mounted smb-fileshares appear to hang;
if that is the case, neither top nor ps ax can run without hanging as
well. When trying to restart smbd, I receive the notice that a process
with a given id cannot be terminated. Trying to kill -9 this process
doesn't help, I have to reboot the system in order to be able to unmount
the fileshare. Reading files on the affected shares is possible without
any hinderances, but writing involves a 30 second wait (precisely 30
seconds). When I try to overwrite I file, I get an I/O-error and the
resulting file is empty:

server-01:/path/to/share-01# date ; echo 1234 > test.txt ; date; cat
test.txt ; date ; echo 2345 > test.txt ; date; cat test.txt; date
Di Jan 18 12:07:34 CET 2005
Di Jan 18 12:08:04 CET 2005
1234
Di Jan 18 12:08:04 CET 2005
-bash: test.txt: Eingabe-/Ausgabefehler
Di Jan 18 12:08:34 CET 2005
Di Jan 18 12:08:34 CET 2005

touch gives me an I/O-error as well - after a 30 second wait period.

The wait-period and I/O-errors apply to any fileshare which is hosted on
a linux box (tried with Samba 2.2.7a-SuSE and Samba 3.0.10-Debian as
hosts); there's no problem of that kind when accessing fileshares on the
Win2k server box. The hanging processes problem under load however does
affect the Win2k-hosted share. I think that all of these issues are
correlated.

In syslog I find the following entries:
localhost kernel: smb_add_request: request [ce132e60, mid=6328] timed
out! (lots of these)
localhost kernel: smb_trans2: invalid data, disp=0, cnt=0, tot=0, ofs=0
(lots of these as well)
localhost kernel: smb_get_length: Invalid NBT packet, code=fe
localhost kernel: smb_get_length: Invalid NBT packet, code=ff
localhost kernel: smb_receive_header: short packet: 0
localhost kernel: smb_receive_header: long packet: 65628
localhost kernel: smb_proc_readX_data: offset is larger than
SMB_READX_MAX_PAD or negative!
localhost kernel: smb_proc_readX_data: -59 > 64 || -59 < 0

I have tested with kernel 2.6.8-1-686-smp from sarge; The test-system
was a fresh sarge install. Downgrading to kernel-image-2.4.27-2-686-smp
(via apt-get install) resolved the issue completely, the same applies
for upgrading to 2.6.10-1-686-smp from unstable (but I don't feel
comfortable enough with an "unstable" kernel on a production system).
Tested with both smbd version 3.0.7-Debian and 3.0.10-Debian. 

I have googled for the timeout-issue to some extent; some suggested that
the CIFS-code in 2.6 up to 2.6.9 was broken regarding the unix
extensions, but a fix would be included in 2.6.10; using "unix
extensions=no" in smb.conf was suggested. I tried this smb.conf-setting,
but the problem persisted. Finally I got fed up with 2.6.8 and
downgraded to 2.4 - and that resolved it.

I am still occasionally getting 
Jan 21 13:43:06 localhost kernel: smb_trans2_request: result=-104,
setting invalid
Jan 21 13:43:06 localhost kernel: smb_retry: successful, new pid=1109,
generation=2
Jan 21 14:01:56 localhost kernel: smb_trans2_request: result=-104,
setting invalid
Jan 21 14:01:56 localhost kernel: smb_retry: successful, new pid=1109,
generation=3
in syslog, which is worrying me a bit, but I haven't noticed anything
bad in the actual operation of the servers after the downgrade. I've yet
to have a single smb_add_request: request [whatever] timed out! with
kernel 2.4.27-2-686-smp.

Here are my smbd.conf and fstab-entries; I have replaced any
identifyable information with dummy-entries
smb.conf
[global]
        workgroup = MYWRKGRP
        netbios name = DEBIAN-01
        server string = Debian Testserver
        security = SHARE
        encrypt passwords = Yes
        map to guest = Bad User
        null passwords = Yes
        log level = 1
        syslog = 0
        time server = Yes
        unix extensions = no
        socket options = SO_KEEPALIVE IPTOS_LOWDELAY TCP_NODELAY
        os level = 2
        default service = www
        guest account = myuser
[www]
        path = /var/www
        read only = Yes
        guest only = Yes
        guest ok = Yes
        hosts allow = All
        nt acl support = No
        hide dot files = No

fstab:
//WINBOX/d$ /var/www/WINBOX smbfs
password=xxxx,username=Winuser,workgroup=MYWRKGRP,uid=500,gid=100,fmask=
666,dmask=777,rw 0 0

I haven't yet found the time for more thorough tests with kernel 2.6.10,
but I shall be happy to do some more testing by your instructions if it
should be necessary. For my needs, this bug (if it doesn't turn out to
be a misconfiguration, that is), is quite critical, i.e. I think it
should definitely be fixed before sarge becomes stable.

Thank you very much for your help!

Kind regards

   Markus



Reply to: