Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes

To: 805252@bugs.debian.org
Subject: Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
From: Christian Seiler <christian@iwakd.de>
Date: Sat, 30 Jan 2016 15:36:28 +0100
Message-id: <[🔎] 56ACCA6C.4020801@iwakd.de>
Reply-to: Christian Seiler <christian@iwakd.de>, 805252@bugs.debian.org
In-reply-to: <[🔎] 56A94687.7070202@iwakd.de>
References: <[🔎] 56A94687.7070202@iwakd.de>

Control: tags -1 + fixed-upstream patch jessie
Control: found -1 3.16.7-ckt20-1+deb8u3
Control: fixed -1 4.1.1-1~exp1

Hi,

so I've found out that this is actually a bug in LIO (the target) not
the initiator.

Problem is: LIO in Linux up to 4.0 uses vfs_writev to write out blocks
when using the fileio backend, so this has a limit of 1024 iovecs (as
per UIO_MAXIOV in include/uapi/linux/uio.h), each at most a page in
size, so that gives us a maximum of 4 MiB in data that can be processed
per request. (At 4 kiB page size.)

Older versions of the Linux software initiator had a hard limit of
512 kiB per request, which means that at most 128 iovec entries were
used, which fits perfectly. Newer versions of the Linux iSCSI initiator
don't have this hard-coded limit but rather use the value supplied by
the target. (This is correct behavior by the initiator, so there's no
bug there, against what I initially assumed.)

Problem now is that LIO with the fileio backend assumes that 8 MiB may
be transfered at the same time, because (according to comments in
drivers/target/target_core_file.h) they assume for some reason
unbeknownst to me that the maximum number of iovecs is 2048.

Note that this also means that any non-Linux initiator that properly
follows the target-supplied values for the maximum I/O size will run
into the same problem and cause I/O errors, even if it behaves
properly.

This problem doesn't affect upstream anymore, because they have
rewritten LIO to use vfs_iter_write, which doesn't have such
limitations, but was only introduced after 3.16. Backporting this would
be too much, and probably ABI-incompatible.

Fortunately, there's a much easier way to fix this, by just lowering
the limit in drivers/target/target_core_file.h to 4 MiB. I've tested
that and the limit will be properly set by LIO and newer initiators
won't choke on that, so that fixes the bug. See the attached patch.

CAVEAT: there is a slight problem with this change, and I don't know
what the best solution here is: the optimal_sectors setting for fileio
disks on people's existing setups is likely to be 16384, because that
corresponds to 8 MiB (the previous max value, which is the default for
optimal_sectors if no other value is set) - but that will cause the
kernel to refuse that setting, because it's now larger than the maximum
allowed value. If you use targetcli 3 (not part of Jessie, but you can
install e.g. the version from sid) then that will fail to set up the
target properly, because it will abort as soon as it notices that it
can't make the setting. (Leftover targetcli 2 from Wheezy upgrades
should not be affected as badly as far as I can tell, because the
startup scripts seem to ignore errors. But I haven't tested that.) So
that leaves the situation that without this fix, 3.16 kernels produce
I/O errors when used with initiators that respect the kernel's setting,
but with the fix the target configuration needs to be updated. (Of
course, one could also patch the kernel to ignore the specific value of
16384 for optimal_sectors if fileio is used as a backend and print a
warning.) Don't know what you'd prefer here.

Also note that this likely also affects the kernel in Wheezy, although
I haven't done any tests in that direction.

Regards,
Christian

PS, for reference, upstream discussion on the initiator mailing list
that resulted in me finding out that it's not the initiator but the
target that was the problem:
https://groups.google.com/forum/#!topic/open-iscsi/UE2JJfDmQ7w

From: Christian Seiler <christian@iwakd.de>
Date: Sat, 30 Jan 2016 13:48:54 CET
Subject: LIO: assume a maximum of 1024 iovecs

Previously the code assumed that vfs_[read|write}v supported 2048
iovecs, which is incorrect, as UIO_MAXIOV is 1024 instead. This caused
the target to advertise a maximum I/O size that was too large, which in
turn caused conforming initiators (most notably recent Linux kernels,
which started to respect the maximum I/O size of the target and not
have a hard-coded 512 kiB as previous kernel versions did) to send
write requests that were too large, resulting in LIO rejecting them
(kernel: fd_do_rw() write returned -22), resulting in data loss.

This patch adjusts the limit to 1024 iovecs, and also uses the
PAGE_SIZE macro instead of just assuming 4 kiB pages.

Signed-off-by: Christian Seiler <christian@iwakd.de>
---
 drivers/target/target_core_file.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/target/target_core_file.h
+++ b/drivers/target/target_core_file.h
@@ -1,6 +1,8 @@
 #ifndef TARGET_CORE_FILE_H
 #define TARGET_CORE_FILE_H

+#include <asm/page.h>
+
 #define FD_VERSION		"4.0"

 #define FD_MAX_DEV_NAME		256
@@ -9,9 +11,9 @@
 #define FD_MAX_DEVICE_QUEUE_DEPTH 128
 #define FD_BLOCKSIZE		512
 /*
- * Limited by the number of iovecs (2048) per vfs_[writev,readv] call
+ * Limited by the number of iovecs (1024) per vfs_[writev,readv] call
  */
-#define FD_MAX_BYTES		8388608
+#define FD_MAX_BYTES		(1024*PAGE_SIZE)

 #define RRF_EMULATE_CDB		0x01
 #define RRF_GOT_LBA		0x02
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -674,7 +674,7 @@ fd_execute_rw(struct se_cmd *cmd, struct
 	 */
 	if (cmd->data_length > FD_MAX_BYTES) {
 		pr_err("FILEIO: Not able to process I/O of %u bytes due to"
-		       "FD_MAX_BYTES: %u iovec count limitiation\n",
+		       "FD_MAX_BYTES: %lu iovec count limitiation\n",
 			cmd->data_length, FD_MAX_BYTES);
 		return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 	}

Attachment: signature.asc
Description: OpenPGP digital signature

Reply to:

References:
- Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
  - From: Christian Seiler <christian@iwakd.de>

Prev by Date: Processed: Re: Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Next by Date: Bug#813220: initramfs-tools-core: Needs to depend on busybox
Previous by thread: Processed: Re: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Next by thread: Processed: Re: Bug#805252: linux-image-4.2.0-1-amd64: I/O errors when writing to iSCSI volumes
Index(es):
- Date
- Thread