Bug#629994: sendfile returns early without user-visible reason

To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Marc Lehmann <debian-reportbug@plan9.de>, 629994@bugs.debian.org
Subject: Bug#629994: sendfile returns early without user-visible reason
From: Marc Lehmann <schmorp@schmorp.de>
Date: Fri, 10 Jun 2011 12:15:44 +0200
Message-id: <[🔎] 20110610101544.GA12349@schmorp.de>
Reply-to: Marc Lehmann <schmorp@schmorp.de>, 629994@bugs.debian.org
In-reply-to: <[🔎] 20110610082138.GA29517@elie>
References: <[🔎] 20110610051915.8964.11599.reportbug@cerebro.laendle> <[🔎] 20110610082138.GA29517@elie>

On Fri, Jun 10, 2011 at 03:21:38AM -0500, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Indeed, read(2) does the same thing (truncates to 7ffff000) and has done

What the fuck, it's buggy, indeed:

   read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3298534883328) = 2147479552

> so for five years, though it's a little harder to notice (I had to use
> mmap to create a large file-backed buffer to read into.)

Well, for read, the situation is a bit different, because thats a clear
posix violation. While this is obviously not relevant for sendfile, it of
course makes sense to use posix (or simply traditional unix) I/O semantics
for sendfile as well *iff* read implements posix behaviour.

> Background: even with read(2) and write(2), partial progress does not
> necessarily represent an error.

Thats true _in general_, but in posix/unix/sus it has clearly defined
and user-visible semantics for that, which require that the success case
transfers as many bytes as can be transferred, and not stop a random
amount earlier unless there is an error condition (signal => EINTR if not
restarted, and easily controllable by applications - for example, not
doing anything with signals makes it work):

   The value returned may be less than nbyte if the number of bytes left
   in the file is less than nbyte, if the read() request was interrupted
   by a signal, or if the file is a pipe or FIFO or special file and has
   fewer than nbyte bytes immediately available for reading.

In this case, the size of the file was also the sendfile transfer size. If
read() was used, and read wouldn't be non-posix, then a partial read
necessarily indicates an error of some kind.

> For example, on "slow" devices like a terminal, pipe, or socket, a
> partial success can indicate interruption by a signal, and on a named
> or unnamed pipe it can indicate that fewer than the requested number of
> bytes were immediately available.

A file is not a terminal, pipe or socket - I specifically reported a
file->file problem, not a socket->file or file->socket problem (the
former is probably not supported and the latter has a whole lot different
error modes/behvaiour). For files, unless you do signal stuff, a partial
(posix-) read indicates error or end of file.

Applications are well aware of the differences between sockets and files
for example (set nonblocking mode for example to see very different
behaviour).

The biggets difference to your example and my exampel however is that the
posix requireemnts are so strong, and the unix behaviour has been working for
so long, that applications that have short writes or reads when they know
there is more data can rightfully abort the process.

With sockets, posix semantics are different, so the normal behaviour of
applications is to retry.

This works fine as long as the OS follows posix.

> So I am somewhat curious about these many programs --- why are they
> expecting this from sendfile?

Because thats how every unix works now and has in the past, and thats what
the unix standard requires.

I think it is reasonable for programs to expect sendfile to behave like a
synthesized read+write, as opposed to "weird" semantics, and I think it is
reasonable for read() to follow the posix semantics nowadays, it shouldn't
be that hard to implement it, and the portability gain from having posix
behaviour is immense.

That linux apparently fails to implement this for read too, makes it
consistent (which is kind of good), but creates a portability problem for
unix programs.

> The manpage is outdated and does not even indicate that sendfile can be
> used to copy a file.

My manpage clearly alows it:

      sendfile()  copies  data between one file descriptor and another.

There are no other hard requirements listed for sendfile. It mentions that
in 2.6.9, there are extra requirements, but thats obviously not relevant
to (and untrue for) 2.6.39.

That's no different to read or write, both of which also work on file
descriptors and put no other requirements on them.

Since files are accessible via file descriptors, the sendfile manpage
clearly says it can be used to copy a file (or more correctly, to transfer
data from one file to another).

However, the manpage says:

       Applications may wish to fall back to read(2)/write(2) in the case
       where sendfile() fails with EINVAL or ENOSYS.

And this is in fact what many applications do, try it, and then fall back.

It's also common sense, and the rationale behind the design (cf. Linuses mails
on that topic) - sendfile should implement what the kernel can do more
efficiently, and otherwise signal the application that it should do it itself
(EINVAL). The expected applciation behaviour is just that: flal back to
read/write, and this worked in the past.

The problem is precisely that sendfile changed semantics.

> Has the size allowed for a single sendfile(2) call changed over time?

Implementations following the manpage worked in the past, yes, because
read or write emulation usually uses smaller than 2gb buffers (and if
read(2) would be fixed, it would even work with larger buffer sizes).

The size allowed for a single sendfile was about 0 in earlier versions,
because they returned EINVAL.

> Is this a regression or a request for a new feature?

It seems there are two regressions: read no longer being posix compliant
and sendfile no longer telling applications to use a (working) read/write
loop but instead attempting the copy itself.

> If an application wants to print a useful error message, it has to try
> again until sendfile returns -1 so errno can be set.

Thats clearly just an opinion. The authors of gnu tar and many existing
applications apparently disagree, as do I.

It's widespread behaviour to expect posix semantics nowadays, and quite
reaosnable to expect similar behaviour by sendfile.

In fact, great peril has been brought over the world by introducing so
horribly misdesigned interfaces such as epoll() (and, to a lesser extent,
similar mechanisms in other kernels), that creating consistency in the
form of using posix semangics for any file I/O is clearly a good thing.

But that's just my opinion :)

> Anyway, I agree that it would be better for sendfile to return partial
> results less often,

I think sendfile should follow the same semantics as unix read(), and
further, linux should follow both defacto historical unix behaviour as
well as posix/sus behaviour and not return partial results in cases not
allowed by posix.

> to make one-off programs easier to write and to decrease the number of
> syscalls made, but that doesn't seem worth

The whole *point* of sendfile is to decrease the number of syscalls, for
high-performance programs. If overhead isn't an issue, then read+write are
much more portable, and typically easier to use.

As such, if sendfile requires extra unnecessary syscalls, this is clearly
a design violation.

> write more than fits in an "int" at a given moment.  So I'm marking this
> wontfix for now.

Should I open a separate bug for read(2) then, or will posix compliance
also be a wontfix (a valid position)?

> An obvious possible improvement would be to update the manpages to
> include information about this.  Would you be interested in that, and if
> so, can you suggest a wording?

I guess something like that would be fine:

   sendfile is not the same as a read+write combination - it may transfer
   and return fewer bytes than requested for no user-visible reason.

that would require read(2) to be fixed. A warning that read doesn't
implement posix semantics for file I/O and also errornously might return
partial results might be very useful, too.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

Reply to:

Follow-Ups:
- Bug#629994: sendfile returns early without user-visible reason
  - From: Jonathan Nieder <jrnieder@gmail.com>
- Bug#629994: sendfile returns early without user-visible reason
  - From: Bastian Blank <waldi@debian.org>

References:
- Bug#629994: linux-image-2.6.39-1-amd64: sendfile returns early without user-visible reason
  - From: Marc Lehmann <debian-reportbug@plan9.de>
- Bug#629994: sendfile returns early without user-visible reason
  - From: Jonathan Nieder <jrnieder@gmail.com>

Prev by Date: Bug#629985: marked as done (initramfs-tools: encrypted rootfs doesn't work)
Next by Date: Bug#629994: sendfile returns early without user-visible reason
Previous by thread: Processed: Re: sendfile returns early without user-visible reason
Next by thread: Bug#629994: sendfile returns early without user-visible reason
Index(es):
- Date
- Thread