[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Libfuse interoperability/ABI broken.



On Wed, Mar 13, 2024 at 1:57 AM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Amir,
>
> thanks for your help!
>
> On 3/9/24 03:46, Amir Goldstein wrote:
> > On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>
> >> Hi all,
> >>
> >> this is certainly not kind of the mail I was hoping for as a new libfuse
> >> maintainer.
> >>
> >> As you can see from the title and from discussion below (sorry this is
> >> not typical ML discussion style), we have a bit of of problem with
> >> libfuse ABI compatibility.
> >>
> >> While scanning through git history, Ashley found a commit that was adding
> >> members in the middle of struct - doesn't break API, but binary
> >> compatibility. That commit already landed a year ago in these releases:
> >>
> >> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> >> fuse-3.14.1
> >> fuse-3.15.0
> >> fuse-3.15.1
> >> fuse-3.16.1
> >> fuse-3.16.2
> >>
> >>
> >> Obviously this needs improved testing for, but right now we wonder what
> >> would be the preferred action to avoid issues.
> >>
> >> a) We could fix it and move bits to the right place. Fixes everything
> >> compiled with old versions, but breaks everything compiled with the
> >> versions above
> >>
> >> b) Increase the .so version - enforces recompilation of all binaries.
> >> Intrusive, especially for distributions, but probably safest.
> >>
> >> c) Other ideas?
> >>
> >
> > Heuristically, you can detect most of the shifted flags at runtime
> > because...
> >
> >>
> >>
> >> I don't think there is anything in libfuse that would allow us to
> >> detect which version of libfuse a library was linked to.
> >>
> >
> > I think we do know for sure if fs was linked with libfuse < 3.12
> > without fuse_loop_mt_312?
> > so only left to detect  3.12..3.14 vs. 3.14..3.16
>
> Hmm, I guess I miss something, but how would I know if it was linked
> with fuse_loop_mt_312? That needs an elf reader? Assuming we would put
> this into the library, somehow, how does this work with stripped binaries?
>
> bschubert2@imesrv6 example>nm passthrough_ll | head -n1
> 00000000000003b4 r __abi_tag
> bschubert2@imesrv6 example>strip passthrough_ll
> bschubert2@imesrv6 example>nm passthrough_ll | head -n1
> nm: passthrough_ll: no symbols
>
>

I thought that invoking fuse_loop_mt_*() indicated the lib version
that binary was built with.
I wasn't taking FUSE_USE_VERSION into account.

>
> >
> >>
> >> The commit shifted these members in struct fuse_file_info {
> >>
> >> struct fuse_file_info {
> >> ...
> >>         /** Can be filled by open/create, to allow parallel direct writes on this
> >>          *  file */
> >>         unsigned int parallel_direct_writes : 1; --> introduced the shift
> >
> > Not expected in flush/release, so can be heuristically interpreted as flush
> >
> >>
> >>         /** Indicates a flush operation.  Set in flush operation, also
> >>             maybe set in highlevel lock operation and lowlevel release
> >>             operation. */
> >>         unsigned int flush : 1;
> >>
> >
> > Not expected in open/create, so can be heuristically interpreted as nonseekable
> >
> >>         /** Can be filled in by open, to indicate that the file is not
> >>             seekable. */
> >>         unsigned int nonseekable : 1;
> >>
> >
> > Not expected in release, so can be heuristically interpreted as flock_release
> >
> >>         /* Indicates that flock locks for this file should be
> >>            released.  If set, lock_owner shall contain a valid value.
> >>            May only be set in ->release(). */
> >>         unsigned int flock_release : 1;
> >>
> >
> > Not expected in opendir, so can be heuristically interpreted as cache_readdir
> >
> >>         /** Can be filled in by opendir. It signals the kernel to
> >>             enable caching of entries returned by readdir().  Has no
> >>             effect when set in other contexts (in particular it does
> >>             nothing when set by open()). */
> >>         unsigned int cache_readdir : 1;
> >>I am not sure I know how versioned symbols work, but
doesn't new lib always have all the functions
fuse_loop_mt_*() and old binary will be invoking the
No, I was not looking closely.

> >
> > Ignored in open, but based on the comment above, it may be
> > implied that some fs may set it in open() reply
> >
> >>         /** Can be filled in by open, to indicate that flush is not needed
> >>             on close. */
> >>         unsigned int noflush : 1;
> >
> > noflush is just an optimization, which the kernel ignores anyway
> > when writeback cache is enabled, so no harm done if it is wrongly
> > interpreted as readdir_cache in open() and ignored.
> > It is also quite recent (3.11) so not very likely used.
> >
> >> };
> >>
> >> I.e. setting flush would actually set parallel_direct_writes
> >> (note: with binaries compiled against older libfuse versions)
> >>
> >
> > It would be suspicious if fs sets parallel_direct_writes flush at
> > flush() or release(), so that can be used as a hint of old ABI
> > interpreted as flush and issue a warning.
> >
> > It would be pretty ugly to use these heuristics in the library forever,
> > but perhaps as safety measure for binaries built with a range of
> > libfuse version that is not likely to be found in the wild (3.14..3.16)
> > it is an acceptable compromise?
>
> I really like your idea about heuristics, I just don't know how to limit
> the libfuse version range.
>
> >
> > and perhaps the next libfuse version can pass the header version
> > fs was compiled with as argument to fuse_loop_mt_317()?
>
>
> I think adding it to fuse_loop_mt_317 would be easy and could be
> auto-added with a macro. I think more difficult is to add it to the old
> api/abi. I.e. it would take a few years until that api version gets
> mostly used?
>

I guess we would add __fuse_loop_mt_*() that take an auto
lib version argument.
Old binaries are old binaries.
But binaries rebuilt with new lib will always end up calling
__fuse_loop_mt_*() functions with the lib version. No?

Thanks,
Amir.


Reply to: