[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#447153: /usr/bin/scp: Fails to notice write errors



# For linux-cifs-client: this paragraph is for the Debian bug tracking
# system control robot. Please ignore it.
reassign 447153 linux-2.6
thanks

On Fri, Oct 19, 2007 at 12:03:01AM +0200, Michal Suchanek wrote:
> On 18/10/2007, Colin Watson <cjwatson@debian.org> wrote:
> > On Thu, Oct 18, 2007 at 03:32:27PM +0200, Hramrach wrote:
> > > When copying to a cifs share scp fails to notice write errors and
> > > happily continues copying when there is no disk space.
> > > Note that cifs probably only reports these errors on close(), not
> > > write().
> 
> cp reports the error:
> 
> cp firefox-2.0.0.4.tar.gz /mnt/
> cp: closing `/mnt/firefox-2.0.0.4.tar.gz': No space left on device
> 
> scp produces files of the same size with different content for small
> files and truncated (although not always zero) files for larger files,
> and notices no problem.

I've reproduced this locally and I believe it's a kernel bug. Here's an
strace excerpt:

[pid 25301] <... read resumed> "C", 1)  = 1
[pid 25301] read(0, "0", 1)             = 1
[pid 25301] read(0, "6", 1)             = 1
[pid 25301] read(0, "4", 1)             = 1
[pid 25301] read(0, "4", 1)             = 1
[pid 25301] read(0, " ", 1)             = 1
[pid 25301] read(0, "1", 1)             = 1
[pid 25301] read(0, "4", 1)             = 1
[pid 25301] read(0, "1", 1)             = 1
[pid 25301] read(0, " ", 1)             = 1
[pid 25301] read(0, "t", 1)             = 1
[pid 25301] read(0, ".", 1)             = 1
[pid 25301] read(0, "p", 1)             = 1
[pid 25301] read(0, "y", 1)             = 1
[pid 25301] read(0, "\n", 1)            = 1
[pid 25301] stat64("cifstest-mount//t.py", {st_mode=S_IFREG|0744, st_size=141, ...}) = 0
[pid 25301] open("cifstest-mount//t.py", O_WRONLY|O_CREAT|O_LARGEFILE, 0644) = 3
[pid 25301] write(1, "\0", 1)           = 1
[pid 25301] fstat64(3,  <unfinished ...>
[pid 25301] <... fstat64 resumed> {st_mode=S_IFREG|0744, st_size=141, ...}) = 0
[pid 25301] read(0,  <unfinished ...>
[pid 25301] <... read resumed> "import gtk\nimport locale\nlocale."..., 141) = 141
[pid 25301] write(3, "import gtk\nimport locale\nlocale."..., 141) = 141
[pid 25301] ftruncate64(3, 141)         = 0
[pid 25301] close(3)                    = 0
[pid 25301] read(0, "\0", 1)            = 1
[pid 25301] write(1, "\0", 1)           = 1
[pid 25301] read(0,  <unfinished ...>
[pid 25301] <... read resumed> "", 1)   = 0
[pid 25301] exit_group(0)               = ?

As you can see, it simply isn't getting any error back from the kernel.

Here's a reduced test program that exhibits the same problem as scp when
run with a filename on a CIFS mount of a full filesystem:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char **argv)
{
    int fd;
    if (argc <= 1) {
        fprintf(stderr, "Usage: %s filename\n", argv[0]);
        return 1;
    }
    fd = open(argv[1], O_CREAT | O_WRONLY, 0644);
    if (fd < 0) {
        perror("open");
        return 1;
    }
    while (write(fd, "x", 1) < 1) {
        if (errno == EINTR)
            continue;
        perror("write");
        return 1;
    }
    if (ftruncate(fd, 1) < 0) {
        perror("ftruncate");
        return 1;
    }
    if (close(fd) < 0) {
        perror("close");
        return 1;
    }
    return 0;
}

No error, but you end up with a one-byte hole rather than either (a) "x"
or (b) an error. If you leave out the ftruncate, then close returns
ENOSPC. In either case, you get "CIFS VFS: Write2 ret -28, written = 0"
in syslog, but the error code doesn't make it to userspace if there's an
ftruncate between write and close. Is this a CIFS bug? I can't find
anything in the ftruncate documentation that suggests it is allowed to
do this; I think that if write claims to have written all the bytes then
userspace ought to be able to assume that ftruncate(fd, st_size) is a
no-op.


To openssh-unix-dev: does anyone think this is worth a workaround? The
ftruncate seems rather unnecessary if we've already written out the
required number of bytes anyway. I've attached a patch which only does
it if that isn't the case (although I have some trouble seeing how we
could ever get to the ftruncate without either writing the required
number of bytes or encountering a write error). If people think it's a
good idea I'll file it in Bugzilla.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]
Index: scp.c
===================================================================
RCS file: /cvs/openssh/scp.c,v
retrieving revision 1.175
diff -p -u -r1.175 scp.c
--- scp.c	26 Oct 2007 05:39:15 -0000	1.175
+++ scp.c	12 Nov 2007 18:31:14 -0000
@@ -849,7 +849,7 @@ sink(int argc, char **argv)
 	size_t j, count;
 	int amt, exists, first, ofd;
 	mode_t mode, omode, mask;
-	off_t size, statbytes;
+	off_t size, statbytes, writtenbytes;
 	int setimes, targisdir, wrerrno = 0;
 	char ch, *cp, *np, *targ, *why, *vect[1], buf[2048];
 	struct timeval tv[2];
@@ -1019,7 +1019,7 @@ bad:			run_err("%s: %s", np, strerror(er
 		cp = bp->buf;
 		wrerr = NO;
 
-		statbytes = 0;
+		statbytes = writtenbytes = 0;
 		if (showprogress)
 			start_progress_meter(curfile, size, &statbytes);
 		set_nonblock(remin);
@@ -1047,7 +1047,8 @@ bad:			run_err("%s: %s", np, strerror(er
 					    count) != count) {
 						wrerr = YES;
 						wrerrno = errno;
-					}
+					} else
+						writtenbytes += (off_t)count;
 				}
 				count = 0;
 				cp = bp->buf;
@@ -1060,8 +1061,10 @@ bad:			run_err("%s: %s", np, strerror(er
 		    atomicio(vwrite, ofd, bp->buf, count) != count) {
 			wrerr = YES;
 			wrerrno = errno;
-		}
+		} else
+			writtenbytes += (off_t)count;
 		if (wrerr == NO && (!exists || S_ISREG(stb.st_mode)) &&
+		    writtenbytes < size &&
 		    ftruncate(ofd, size) != 0) {
 			run_err("%s: truncate: %s", np, strerror(errno));
 			wrerr = DISPLAYED;

Reply to: