[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Libfuse interoperability/ABI broken.



On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi all,
>
> this is certainly not kind of the mail I was hoping for as a new libfuse
> maintainer.
>
> As you can see from the title and from discussion below (sorry this is
> not typical ML discussion style), we have a bit of of problem with
> libfuse ABI compatibility.
>
> While scanning through git history, Ashley found a commit that was adding
> members in the middle of struct - doesn't break API, but binary
> compatibility. That commit already landed a year ago in these releases:
>
> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> fuse-3.14.1
> fuse-3.15.0
> fuse-3.15.1
> fuse-3.16.1
> fuse-3.16.2
>
>
> Obviously this needs improved testing for, but right now we wonder what
> would be the preferred action to avoid issues.
>
> a) We could fix it and move bits to the right place. Fixes everything
> compiled with old versions, but breaks everything compiled with the
> versions above
>
> b) Increase the .so version - enforces recompilation of all binaries.
> Intrusive, especially for distributions, but probably safest.
>
> c) Other ideas?
>

Heuristically, you can detect most of the shifted flags at runtime
because...

>
>
> I don't think there is anything in libfuse that would allow us to
> detect which version of libfuse a library was linked to.
>

I think we do know for sure if fs was linked with libfuse < 3.12
without fuse_loop_mt_312?
so only left to detect  3.12..3.14 vs. 3.14..3.16

>
> The commit shifted these members in struct fuse_file_info {
>
> struct fuse_file_info {
> ...
>         /** Can be filled by open/create, to allow parallel direct writes on this
>          *  file */
>         unsigned int parallel_direct_writes : 1; --> introduced the shift

Not expected in flush/release, so can be heuristically interpreted as flush

>
>         /** Indicates a flush operation.  Set in flush operation, also
>             maybe set in highlevel lock operation and lowlevel release
>             operation. */
>         unsigned int flush : 1;
>

Not expected in open/create, so can be heuristically interpreted as nonseekable

>         /** Can be filled in by open, to indicate that the file is not
>             seekable. */
>         unsigned int nonseekable : 1;
>

Not expected in release, so can be heuristically interpreted as flock_release

>         /* Indicates that flock locks for this file should be
>            released.  If set, lock_owner shall contain a valid value.
>            May only be set in ->release(). */
>         unsigned int flock_release : 1;
>

Not expected in opendir, so can be heuristically interpreted as cache_readdir

>         /** Can be filled in by opendir. It signals the kernel to
>             enable caching of entries returned by readdir().  Has no
>             effect when set in other contexts (in particular it does
>             nothing when set by open()). */
>         unsigned int cache_readdir : 1;
>

Ignored in open, but based on the comment above, it may be
implied that some fs may set it in open() reply

>         /** Can be filled in by open, to indicate that flush is not needed
>             on close. */
>         unsigned int noflush : 1;

noflush is just an optimization, which the kernel ignores anyway
when writeback cache is enabled, so no harm done if it is wrongly
interpreted as readdir_cache in open() and ignored.
It is also quite recent (3.11) so not very likely used.

> };
>
> I.e. setting flush would actually set parallel_direct_writes
> (note: with binaries compiled against older libfuse versions)
>

It would be suspicious if fs sets parallel_direct_writes flush at
flush() or release(), so that can be used as a hint of old ABI
interpreted as flush and issue a warning.

It would be pretty ugly to use these heuristics in the library forever,
but perhaps as safety measure for binaries built with a range of
libfuse version that is not likely to be found in the wild (3.14..3.16)
it is an acceptable compromise?

and perhaps the next libfuse version can pass the header version
fs was compiled with as argument to fuse_loop_mt_317()?

Thanks,
Amir.


Reply to: