libdpkg: buffered I/O API
Hi,
I finally found some time to work on moving the compression and
decompression code to the buffer library. It seems like a good idea and
an interesting one, but first I ran into a couple of questions I’d like
to get your feedback on.
This patch series reorganizes the libdpkg buffer functions to make it a
little easier to add new buffer types. I am sending it because it puts
the signatures for the functions a buffer type might provide in one place,
so even though the series does not do much, it makes API questions a
little clearer.
1. Currently, error reporting uses the usual convention of returning -1 and
putting a more specific error code in errno. Obviously, this is not a
good way to report errors from backends that do not use the same
convention. For example, if the error flag is already set on a
stream before a call to stream_md5(), then the result would be an
error message something like the following:
failed to read on buffer copy for <whatever>: Success
It would be more friendly for write_stream() to check the error flag
itself and report that with a more meaningful error result.
Of course, dpkg itself never makes such a mistake AFAICT. A more
important problem is that it is hard to fit other backends (e.g., zlib)
in to the “set errno, return -1” scheme.
I suggest adding a pointer argument to return either:
a. a universally valid error code like those used in libcom_err; or
b. an error string. In this case, there is an ugly associated
question: who is responsible for freeing the error strings? The
simplest answer: the caller, except for a special "Out of memory"
string (ick).
Despite the ugliness, I’m inclined towards b.
2. Incomplete I/O is hard to handle rationally. The current code will
busy-loop if a caller provides a non-blocking file descriptor to
fd_fd_copy(); it might make sense to error out on EAGAIN instead
(though this is also something dpkg will never run into).
Complicating this is that Linux 2.2 used to return EAGAIN on 0-length
reads from a nonempty pipe [1], but we already work around that in
other ways.
If read_stream() performs a partial read and then receives an I/O
error, then the number of bytes successfully read is thrown away.
Do we want the caller to be able to recover? Practically speaking,
this is only relevant if read_stream() is something callers should
be able to use from a select() loop (i.e., with file handles
fdopen()'d from non-blocking file descriptors, which can return
EAGAIN).
Options:
a. busy-loop on EAGAIN (which punishes callers who try this);
a. transparently poll() on EAGAIN (bad idea IMHO);
b. consider EAGAIN an unrecoverable error; or
c. report number of bytes consumed/produced on errors.
I’m inclined towards c.
Thoughts?
Jonathan Nieder (4):
libdpkg: Collect MD5-related functions in one place
libdpkg: Refactor generic I/O code into backend-specific functions
libdpkg: Make buffer_init() and buffer_done() return void
libdpkg: Reorganize buffer module around new buffer_type struct
lib/dpkg/buffer.c | 274 +++++++++++++++++++++++++++++++++++++++--------------
lib/dpkg/buffer.h | 69 ++++++++-----
2 files changed, 243 insertions(+), 100 deletions(-)
[1] <http://bugs.debian.org/81881>
Reply to: