[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#770479: linux-image-3.16.0-4-amd64: nbd timeout option is not working



Package: src:linux
Version: 3.16.7-2
Severity: normal
Tags: patch

Dear Maintainer,

the nbd timeout settings from nbd-client to the kernel is broken inside the jessie kernel.
This renders raid1 on top of nbd devices useless, as that device will simply hang
when the network connection or the nbd-server fails until the connection to
the nbd-server is brought back to live - clearly not what is intended with
using a raid1.

This worked in lenny and was broken since wheezy.

Appended is the patch obtained from the nbd subsystem maintainers list, which allows
me to rebuild the jessie kernel package and made it work again.

If you need futher information or references please let me know.
Many thanks,
 greetings
  Hermann

-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.3 (Debian 4.8.3-13) ) #1 SMP Debian 3.16.7-2 (2014-11-06)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=UUID=7695ce6d-8761-4755-b460-8e0bcd26f176 ro quiet

** Not tainted

** Kernel log:
[    0.533678] Freeing unused kernel memory: 940K (ffff880001515000 - ffff880001600000)
[    0.534261] Freeing unused kernel memory: 228K (ffff8800017c7000 - ffff880001800000)
[    0.588448] systemd-udevd[52]: starting version 215
[    0.589295] random: systemd-udevd urandom read with 2 bits of entropy available
[    0.636304] SCSI subsystem initialized
[    0.645829] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
[    0.646259] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10
[    0.652879] ACPI: bus type USB registered
[    0.652904] usbcore: registered new interface driver usbfs
[    0.652925] usbcore: registered new interface driver hub
[    0.654698] 8139cp 0000:00:03.0 eth0: RTL-8139C+ at 0xffffc90000002000, 54:52:00:af:12:a2, IRQ 10
[    0.656747] FDC 0 is a S82078B
[    0.658026] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.658696] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.660542] usbcore: registered new device driver usb
[    0.661896] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.663074] uhci_hcd: USB Universal Host Controller Interface driver
[    0.663357] uhci_hcd 0000:00:01.2: UHCI Host Controller
[    0.663365] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
[    0.663401] uhci_hcd 0000:00:01.2: detected 2 ports
[    0.663507] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c140
[    0.665457] 8139too: 8139too Fast Ethernet driver 0.9.28
[    0.666046] libata version 3.00 loaded.
[    0.673509] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
[    0.673513] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    0.673515] usb usb1: Product: UHCI Host Controller
[    0.673517] usb usb1: Manufacturer: Linux 3.16.0-4-amd64 uhci_hcd
[    0.673518] usb usb1: SerialNumber: 0000:00:01.2
[    0.674124] hub 1-0:1.0: USB hub found
[    0.674132] hub 1-0:1.0: 2 ports detected
[    0.675157] ata_piix 0000:00:01.1: version 2.13
[    0.684687] scsi0 : ata_piix
[    0.693987] scsi1 : ata_piix
[    0.694036] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc1a0 irq 14
[    0.694039] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc1a8 irq 15
[    0.698493] virtio-pci 0000:00:04.0: irq 40 for MSI/MSI-X
[    0.698522] virtio-pci 0000:00:04.0: irq 41 for MSI/MSI-X
[    0.698550] virtio-pci 0000:00:04.0: irq 42 for MSI/MSI-X
[    0.703134]  vda: vda1 vda2 < vda5 >
[    0.885202] nbd: registered device at major 43
[    0.926591] PM: Starting manual resume from disk
[    0.926596] PM: Hibernation image partition 254:5 present
[    0.926598] PM: Looking for hibernation image.
[    0.926745] PM: Image not found (code -22)
[    0.926748] PM: Hibernation image not present or could not be loaded.
[    0.959121] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
[    1.161669] systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory.
[    1.344269] systemd-udevd[137]: starting version 215
[    1.437657] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[    1.437663] ACPI: Power Button [PWRF]
[    1.493314] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0
[    1.496327] tsc: Refined TSC clocksource calibration: 2659.670 MHz
[    1.531351] [drm] Initialized drm 1.1.0 20060810
[    1.549336] input: PC Speaker as /devices/platform/pcspkr/input/input4
[    1.610390] Adding 138236k swap on /dev/vda5.  Priority:-1 extents:1 across:138236k FS
[    1.685921] ppdev: user-space parallel port driver
[    1.838506] EXT4-fs (vda1): re-mounted. Opts: errors=remount-ro
[    2.259620] 8139cp 0000:00:03.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1
[    2.271080] systemd-journald[128]: Received request to flush runtime journal from PID 1
[    2.370535] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[    2.402626] RPC: Registered named UNIX socket transport module.
[    2.402630] RPC: Registered udp transport module.
[    2.402631] RPC: Registered tcp transport module.
[    2.402632] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    2.413994] FS-Cache: Loaded
[    2.424204] FS-Cache: Netfs 'nfs' registered for caching
[    2.441518] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[  123.184988] block nbd0: NBD_SET_TIMEOUT: 0 -> 5
[  124.198368] random: nonblocking pool is initialized
[  124.201614]  nbd0: unknown partition table
[  554.016041] nbd: killing hung xmit (nbd-client, pid: 745)
[  554.016928] nbd (pid 745: nbd-client) got signal 9
[  554.016934] block nbd0: shutting down socket
[  554.017044] block nbd0: Receive control failed (result -4)
[  554.017480] end_request: I/O error, dev nbd0, sector 29514736
[  554.017861] Buffer I/O error on device nbd0, logical block 14757368
[  554.018259] Buffer I/O error on device nbd0, logical block 14757369
[  554.018656] Buffer I/O error on device nbd0, logical block 14757370
[  554.019053] Buffer I/O error on device nbd0, logical block 14757371
[  554.019452] Buffer I/O error on device nbd0, logical block 14757372
[  554.019849] Buffer I/O error on device nbd0, logical block 14757373
[  554.020028] Buffer I/O error on device nbd0, logical block 14757374
[  554.020028] Buffer I/O error on device nbd0, logical block 14757375
[  554.020028] Buffer I/O error on device nbd0, logical block 14757376
[  554.020028] Buffer I/O error on device nbd0, logical block 14757377
[  554.021951] end_request: I/O error, dev nbd0, sector 29514992
[  554.022368] block nbd0: queue cleared
[  554.023825] block nbd0: Attempted send on closed socket
[  554.024221] end_request: I/O error, dev nbd0, sector 29514736
[  554.024921] block nbd0: Attempted send on closed socket
[  554.025297] end_request: I/O error, dev nbd0, sector 29514738
[  554.025678] block nbd0: Attempted send on closed socket
[  554.026034] end_request: I/O error, dev nbd0, sector 29514740
[  554.026413] block nbd0: Attempted send on closed socket
[  554.026768] end_request: I/O error, dev nbd0, sector 29514742
[  786.935693] block nbd0: NBD_DISCONNECT
[  786.948625] nbd: unregistered device at major 43
[  801.888818] nbd: registered device at major 43
[  801.952479] block nbd0: NBD_SET_TIMEOUT: 0 -> 5
[  801.968453]  nbd0: unknown partition table

** Model information
sys_vendor: Bochs
product_name: Bochs
product_version: 
chassis_vendor: Bochs
chassis_version: 
bios_vendor: Bochs
bios_version: Bochs

** Loaded modules:
nbd
nfsd
auth_rpcgss
oid_registry
nfs_acl
nfs
lockd
fscache
sunrpc
md_mod
ppdev
ttm
pcspkr
drm_kms_helper
evdev
psmouse
serio_raw
virtio_balloon
i2c_piix4
drm
i2c_core
parport_pc
parport
processor
button
thermal_sys
autofs4
ext4
crc16
mbcache
jbd2
ata_generic
virtio_blk
virtio_net
ata_piix
uhci_hcd
ehci_hcd
8139too
virtio_pci
virtio_ring
virtio
8139cp
mii
floppy
libata
usbcore
usb_common
scsi_mod

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02)
	Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000]
	Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
	Physical Slot: 1
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010] (prog-if 80 [Master])
	Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
	Physical Slot: 1
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
	Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable)
	Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
	Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable)
	Region 4: I/O ports at c1a0 [size=16]
	Kernel driver in use: ata_piix

00:01.2 USB controller [0c03]: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] [8086:7020] (rev 01) (prog-if 00 [UHCI])
	Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
	Physical Slot: 1
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin D routed to IRQ 11
	Region 4: I/O ports at c140 [size=32]
	Kernel driver in use: uhci_hcd

00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03)
	Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100]
	Physical Slot: 1
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 9
	Kernel driver in use: piix4_smbus

00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8] (prog-if 00 [VGA controller])
	Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
	Physical Slot: 2
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
	Region 1: Memory at febfd000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at <unassigned> [disabled]

00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter [10ec:8139] (rev 20)
	Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100]
	Physical Slot: 3
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 10
	Region 0: I/O ports at c000 [size=256]
	Region 1: Memory at febfe000 (32-bit, non-prefetchable) [size=256]
	Kernel driver in use: 8139cp

00:04.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]
	Subsystem: Red Hat, Inc Device [1af4:0001]
	Physical Slot: 4
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at c160 [size=32]
	Region 1: Memory at febff000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: virtio-pci

00:05.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001]
	Subsystem: Red Hat, Inc Device [1af4:0002]
	Physical Slot: 5
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 10
	Region 0: I/O ports at c100 [size=64]
	Kernel driver in use: virtio-pci

00:06.0 RAM memory [0500]: Red Hat, Inc Virtio memory balloon [1af4:1002]
	Subsystem: Red Hat, Inc Device [1af4:0005]
	Physical Slot: 6
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at c180 [size=32]
	Kernel driver in use: virtio-pci


** USB devices:
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub


-- System Information:
Debian Release: jessie/sid
  APT prefers testing-updates
  APT policy: (500, 'testing-updates'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-3.16.0-4-amd64 depends on:
ii  debconf [debconf-2.0]                   1.5.53
ii  initramfs-tools [linux-initramfs-tool]  0.116
ii  kmod                                    18-3
ii  linux-base                              3.5

Versions of packages linux-image-3.16.0-4-amd64 recommends:
ii  firmware-linux-free  3.3
ii  irqbalance           1.0.6-3

Versions of packages linux-image-3.16.0-4-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-pc                 2.02~beta2-15
pn  linux-doc-3.16          <none>

Versions of packages linux-image-3.16.0-4-amd64 is related to:
pn  firmware-atheros        <none>
pn  firmware-bnx2           <none>
pn  firmware-bnx2x          <none>
pn  firmware-brcm80211      <none>
pn  firmware-intelwimax     <none>
pn  firmware-ipw2x00        <none>
pn  firmware-ivtv           <none>
pn  firmware-iwlwifi        <none>
pn  firmware-libertas       <none>
pn  firmware-linux          <none>
pn  firmware-linux-nonfree  <none>
pn  firmware-myricom        <none>
pn  firmware-netxen         <none>
pn  firmware-qlogic         <none>
pn  firmware-ralink         <none>
pn  firmware-realtek        <none>
pn  xen-hypervisor          <none>

-- debconf information:
  linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false
  linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64:
  linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true
commit 6b5f5a68e8da4bc8d948f25b21dcd6eeeb16ae7d
Author: Michal Belczyk <belczyk@bsd.krakow.pl>
Date:   Tue Nov 18 10:50:19 2014 +0100

    nbd: improve request timeouts handling
    
    The main idea behind it is to be able to quickly detect broken replica
    and switch over to another when used with any sort of mirror type device
    built on top of any number of nbd devices.
    
    Before this change a request would time out causing the socket to be
    shut down and the device to fail in case of a dead server or removed
    network cable only if:
    
      a) either the timer around kernel_sendmsg() kicked in
      b) or the TCP failures on retransmission finally caused an error
         on the socket, likely blocked on kernel_recvmsg() at this time,
         waiting for replies from the server
    
    Case a) depends mostly on the size of requests issued and on the maximum
    size of the socket buffer -- a lot of read request headers or small
    write requests could be "sent" without triggering the requested timeout
    
    Case b) timeout is independent of nbd-client -t <timeout> option
    as there is no TCP_USER_TIMEOUT set on the client socket by default.
    And even if such timeout was set it would not solve the problem of
    an nbd-client hung on receiving replies for much longer time without
    setting TCP keep-alives... and that would be the third, independent
    timeout setting required to make it work "almost" as expected...
    
    So, instead, take the big hammer approach and:
    
      *) trace the number of outstanding requests sent to the server
         (nbd->inflight)
      *) enable the timer (nbd->req_timer) before the first request
         is submitted and leave it enabled
      *) when sending next request do not touch the timer (it is up
         to the receiving side to control it at this point)
      *) on receive side update the timer every time a response
         is collected but there are more to read from the server
      *) disable the timer whenever the inflight counter drops
         to zero or an error (leading to the socket shutdown)
         is returned
    
    This patch does NOT prevent the server to process a request for longer
    than the timeout specified if only it replies to any other request
    submitted within the timeout (the server still may reply to a batch
    of requests in any order).
    
    Only the nbd->xmit_timeout != 0 code path is changed so the patch should
    not affect nbd connections running without an explicit timeout set
    on the nbd-client command line.
    
    There is also no way to enable or disable the timeout on an active
    (nbd->pid != 0) nbd device, it is however possible to change its value.
    Otherwise the inflight request counter would have to affect the nbd
    devices enabled without nbd-client -t <timeout>.
    
    Also move nbd->pid modifications behind nbd->tx_lock wherever possible
    to avoid races between the concurrent nbd-client invocations.

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 4bc2a5c..cc4a98a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock)
 
 static void nbd_xmit_timeout(unsigned long arg)
 {
-	struct task_struct *task = (struct task_struct *)arg;
+	struct nbd_device *nbd = (struct nbd_device *)arg;
+	struct task_struct *task_ary[2];
+	unsigned long flags;
+	int i;
 
-	printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n",
-		task->comm, task->pid);
-	force_sig(SIGKILL, task);
+	spin_lock_irqsave(&nbd->timer_lock, flags);
+	nbd->timedout = 1;
+	task_ary[0] = nbd->sender;
+	task_ary[1] = nbd->receiver;
+	for (i = 0; i < 2; i++) {
+		if (task_ary[i] == NULL)
+			continue;
+		printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n",
+			task_ary[i]->comm, task_ary[i]->pid);
+		force_sig(SIGKILL, task_ary[i]);
+		break;
+	}
+	spin_unlock_irqrestore(&nbd->timer_lock, flags);
 }
 
 /*
@@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size,
 	struct msghdr msg;
 	struct kvec iov;
 	sigset_t blocked, oldset;
-	unsigned long pflags = current->flags;
+	unsigned long flags, pflags = current->flags;
 
 	if (unlikely(!sock)) {
 		dev_err(disk_to_dev(nbd->disk),
@@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size,
 		msg.msg_controllen = 0;
 		msg.msg_flags = msg_flags | MSG_NOSIGNAL;
 
-		if (send) {
-			struct timer_list ti;
-
-			if (nbd->xmit_timeout) {
-				init_timer(&ti);
-				ti.function = nbd_xmit_timeout;
-				ti.data = (unsigned long)current;
-				ti.expires = jiffies + nbd->xmit_timeout;
-				add_timer(&ti);
+		if (nbd->xmit_timeout) {
+			spin_lock_irqsave(&nbd->timer_lock, flags);
+			if (nbd->timedout) {
+				spin_unlock_irqrestore(&nbd->timer_lock, flags);
+				printk(KERN_WARNING
+					"nbd (pid %d: %s) timed out\n",
+					task_pid_nr(current), current->comm);
+				result = -EINTR;
+				sock_shutdown(nbd, !send);
+				break;
 			}
+			if (send)
+				nbd->sender = current;
+			else
+				nbd->receiver = current;
+			spin_unlock_irqrestore(&nbd->timer_lock, flags);
+		}
+
+		if (send)
 			result = kernel_sendmsg(sock, &msg, &iov, 1, size);
-			if (nbd->xmit_timeout)
-				del_timer_sync(&ti);
-		} else
+		else
 			result = kernel_recvmsg(sock, &msg, &iov, 1, size,
 						msg.msg_flags);
 
+		if (nbd->xmit_timeout) {
+			spin_lock_irqsave(&nbd->timer_lock, flags);
+			if (send)
+				nbd->sender = NULL;
+			else
+				nbd->receiver = NULL;
+			spin_unlock_irqrestore(&nbd->timer_lock, flags);
+		}
+
 		if (signal_pending(current)) {
 			siginfo_t info;
 			printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
@@ -226,12 +255,12 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size,
 }
 
 static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec,
-		int flags)
+		int msg_flags)
 {
 	int result;
 	void *kaddr = kmap(bvec->bv_page);
 	result = sock_xmit(nbd, 1, kaddr + bvec->bv_offset,
-			   bvec->bv_len, flags);
+			   bvec->bv_len, msg_flags);
 	kunmap(bvec->bv_page);
 	return result;
 }
@@ -239,9 +268,9 @@ static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec,
 /* always call with the tx_lock held */
 static int nbd_send_req(struct nbd_device *nbd, struct request *req)
 {
-	int result, flags;
+	int result, msg_flags;
 	struct nbd_request request;
-	unsigned long size = blk_rq_bytes(req);
+	unsigned long flags, size = blk_rq_bytes(req);
 
 	memset(&request, 0, sizeof(request));
 	request.magic = htonl(NBD_REQUEST_MAGIC);
@@ -253,6 +282,19 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
 	}
 	memcpy(request.handle, &req, sizeof(req));
 
+	if (nbd->xmit_timeout) {
+		spin_lock_irqsave(&nbd->timer_lock, flags);
+		if (!nbd->inflight) {
+			nbd->req_timer.function = nbd_xmit_timeout;
+			nbd->req_timer.data = (unsigned long)nbd;
+			nbd->req_timer.expires = jiffies + nbd->xmit_timeout;
+			add_timer(&nbd->req_timer);
+		}
+		nbd->inflight++;
+		BUG_ON(nbd->inflight <= 0);
+		spin_unlock_irqrestore(&nbd->timer_lock, flags);
+	}
+
 	dprintk(DBG_TX, "%s: request %p: sending control (%s@%llu,%uB)\n",
 			nbd->disk->disk_name, req,
 			nbdcmd_to_ascii(nbd_cmd(req)),
@@ -274,12 +316,12 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
 		 * whether to set MSG_MORE or not...
 		 */
 		rq_for_each_segment(bvec, req, iter) {
-			flags = 0;
+			msg_flags = 0;
 			if (!rq_iter_last(bvec, iter))
-				flags = MSG_MORE;
+				msg_flags = MSG_MORE;
 			dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n",
 					nbd->disk->disk_name, req, bvec.bv_len);
-			result = sock_send_bvec(nbd, &bvec, flags);
+			result = sock_send_bvec(nbd, &bvec, msg_flags);
 			if (result <= 0) {
 				dev_err(disk_to_dev(nbd->disk),
 					"Send data failed (result %d)\n",
@@ -291,6 +333,14 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
 	return 0;
 
 error_out:
+	if (nbd->xmit_timeout) {
+		spin_lock_irqsave(&nbd->timer_lock, flags);
+		nbd->inflight--;
+		BUG_ON(nbd->inflight < 0);
+		if (!nbd->inflight)
+			del_timer_sync(&nbd->req_timer);
+		spin_unlock_irqrestore(&nbd->timer_lock, flags);
+	}
 	return -EIO;
 }
 
@@ -412,24 +462,41 @@ static struct device_attribute pid_attr = {
 static int nbd_do_it(struct nbd_device *nbd)
 {
 	struct request *req;
+	unsigned long flags;
 	int ret;
 
 	BUG_ON(nbd->magic != NBD_MAGIC);
 
 	sk_set_memalloc(nbd->sock->sk);
-	nbd->pid = task_pid_nr(current);
 	ret = device_create_file(disk_to_dev(nbd->disk), &pid_attr);
 	if (ret) {
 		dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n");
-		nbd->pid = 0;
 		return ret;
 	}
 
-	while ((req = nbd_read_stat(nbd)) != NULL)
-		nbd_end_request(req);
+	for (;;) {
+		req = nbd_read_stat(nbd);
+		if (nbd->xmit_timeout) {
+			spin_lock_irqsave(&nbd->timer_lock, flags);
+			if (req != NULL) {
+				nbd->inflight--;
+				BUG_ON(nbd->inflight < 0);
+			}
+			if (req != NULL && nbd->inflight)
+				mod_timer(&nbd->req_timer,
+					  jiffies + nbd->xmit_timeout);
+			else
+				del_timer_sync(&nbd->req_timer);
+			spin_unlock_irqrestore(&nbd->timer_lock, flags);
+		}
+		if (req != NULL) {
+			nbd_end_request(req);
+			continue;
+		}
+		break;
+	}
 
 	device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
-	nbd->pid = 0;
 	return 0;
 }
 
@@ -669,9 +736,20 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		set_capacity(nbd->disk, nbd->bytesize >> 9);
 		return 0;
 
-	case NBD_SET_TIMEOUT:
-		nbd->xmit_timeout = arg * HZ;
+	case NBD_SET_TIMEOUT: {
+		int xt;
+
+		xt = arg * HZ;
+		if (xt < 0)
+			return -EINVAL;
+		if (nbd->pid &&
+		    ((!nbd->xmit_timeout && xt) || (nbd->xmit_timeout && !xt)))
+			return -EBUSY;
+		dev_info(disk_to_dev(nbd->disk), "NBD_SET_TIMEOUT: %d -> %d\n",
+			nbd->xmit_timeout / HZ, xt / HZ);
+		nbd->xmit_timeout = xt;
 		return 0;
+	}
 
 	case NBD_SET_FLAGS:
 		nbd->flags = arg;
@@ -694,6 +772,11 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		if (!nbd->sock)
 			return -EINVAL;
 
+		nbd->pid = task_pid_nr(current);
+		nbd->inflight = 0;
+		nbd->timedout = 0;
+		nbd->sender = NULL;
+		nbd->receiver = NULL;
 		mutex_unlock(&nbd->tx_lock);
 
 		if (nbd->flags & NBD_FLAG_READ_ONLY)
@@ -710,6 +793,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 					nbd->disk->disk_name);
 		if (IS_ERR(thread)) {
 			mutex_lock(&nbd->tx_lock);
+			nbd->pid = 0;
 			return PTR_ERR(thread);
 		}
 		wake_up_process(thread);
@@ -717,6 +801,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		kthread_stop(thread);
 
 		mutex_lock(&nbd->tx_lock);
+		nbd->pid = 0;
 		if (error)
 			return error;
 		sock_shutdown(nbd, 0);
@@ -731,6 +816,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 			sockfd_put(sock);
 		nbd->flags = 0;
 		nbd->bytesize = 0;
+		nbd->xmit_timeout = 0;
 		bdev->bd_inode->i_size = 0;
 		set_capacity(nbd->disk, 0);
 		if (max_part > 0)
@@ -874,6 +960,8 @@ static int __init nbd_init(void)
 		init_waitqueue_head(&nbd_dev[i].waiting_wq);
 		nbd_dev[i].blksize = 1024;
 		nbd_dev[i].bytesize = 0;
+		spin_lock_init(&nbd_dev[i].timer_lock);
+		init_timer(&nbd_dev[i].req_timer);
 		disk->major = NBD_MAJOR;
 		disk->first_minor = i << part_shift;
 		disk->fops = &nbd_fops;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index f62f78a..c1280ca 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -41,6 +41,12 @@ struct nbd_device {
 	pid_t pid; /* pid of nbd-client, if attached */
 	int xmit_timeout;
 	int disconnect; /* a disconnect has been requested by user */
+	spinlock_t timer_lock;
+	struct timer_list req_timer;
+	int inflight;
+	int timedout;
+	struct task_struct *sender;
+	struct task_struct *receiver;
 };
 
 #endif

Reply to: