[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Libfuse interoperability/ABI broken.



Hi Amir,

thanks for your help!

On 3/9/24 03:46, Amir Goldstein wrote:
> On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi all,
>>
>> this is certainly not kind of the mail I was hoping for as a new libfuse
>> maintainer.
>>
>> As you can see from the title and from discussion below (sorry this is
>> not typical ML discussion style), we have a bit of of problem with
>> libfuse ABI compatibility.
>>
>> While scanning through git history, Ashley found a commit that was adding
>> members in the middle of struct - doesn't break API, but binary
>> compatibility. That commit already landed a year ago in these releases:
>>
>> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>> fuse-3.14.1
>> fuse-3.15.0
>> fuse-3.15.1
>> fuse-3.16.1
>> fuse-3.16.2
>>
>>
>> Obviously this needs improved testing for, but right now we wonder what
>> would be the preferred action to avoid issues.
>>
>> a) We could fix it and move bits to the right place. Fixes everything
>> compiled with old versions, but breaks everything compiled with the
>> versions above
>>
>> b) Increase the .so version - enforces recompilation of all binaries.
>> Intrusive, especially for distributions, but probably safest.
>>
>> c) Other ideas?
>>
> 
> Heuristically, you can detect most of the shifted flags at runtime
> because...
> 
>>
>>
>> I don't think there is anything in libfuse that would allow us to
>> detect which version of libfuse a library was linked to.
>>
> 
> I think we do know for sure if fs was linked with libfuse < 3.12
> without fuse_loop_mt_312?
> so only left to detect  3.12..3.14 vs. 3.14..3.16

Hmm, I guess I miss something, but how would I know if it was linked
with fuse_loop_mt_312? That needs an elf reader? Assuming we would put
this into the library, somehow, how does this work with stripped binaries?

bschubert2@imesrv6 example>nm passthrough_ll | head -n1
00000000000003b4 r __abi_tag
bschubert2@imesrv6 example>strip passthrough_ll
bschubert2@imesrv6 example>nm passthrough_ll | head -n1
nm: passthrough_ll: no symbols



> 
>>
>> The commit shifted these members in struct fuse_file_info {
>>
>> struct fuse_file_info {
>> ...
>>         /** Can be filled by open/create, to allow parallel direct writes on this
>>          *  file */
>>         unsigned int parallel_direct_writes : 1; --> introduced the shift
> 
> Not expected in flush/release, so can be heuristically interpreted as flush
> 
>>
>>         /** Indicates a flush operation.  Set in flush operation, also
>>             maybe set in highlevel lock operation and lowlevel release
>>             operation. */
>>         unsigned int flush : 1;
>>
> 
> Not expected in open/create, so can be heuristically interpreted as nonseekable
> 
>>         /** Can be filled in by open, to indicate that the file is not
>>             seekable. */
>>         unsigned int nonseekable : 1;
>>
> 
> Not expected in release, so can be heuristically interpreted as flock_release
> 
>>         /* Indicates that flock locks for this file should be
>>            released.  If set, lock_owner shall contain a valid value.
>>            May only be set in ->release(). */
>>         unsigned int flock_release : 1;
>>
> 
> Not expected in opendir, so can be heuristically interpreted as cache_readdir
> 
>>         /** Can be filled in by opendir. It signals the kernel to
>>             enable caching of entries returned by readdir().  Has no
>>             effect when set in other contexts (in particular it does
>>             nothing when set by open()). */
>>         unsigned int cache_readdir : 1;
>>
> 
> Ignored in open, but based on the comment above, it may be
> implied that some fs may set it in open() reply
> 
>>         /** Can be filled in by open, to indicate that flush is not needed
>>             on close. */
>>         unsigned int noflush : 1;
> 
> noflush is just an optimization, which the kernel ignores anyway
> when writeback cache is enabled, so no harm done if it is wrongly
> interpreted as readdir_cache in open() and ignored.
> It is also quite recent (3.11) so not very likely used.
> 
>> };
>>
>> I.e. setting flush would actually set parallel_direct_writes
>> (note: with binaries compiled against older libfuse versions)
>>
> 
> It would be suspicious if fs sets parallel_direct_writes flush at
> flush() or release(), so that can be used as a hint of old ABI
> interpreted as flush and issue a warning.
> 
> It would be pretty ugly to use these heuristics in the library forever,
> but perhaps as safety measure for binaries built with a range of
> libfuse version that is not likely to be found in the wild (3.14..3.16)
> it is an acceptable compromise?

I really like your idea about heuristics, I just don't know how to limit
the libfuse version range.

> 
> and perhaps the next libfuse version can pass the header version
> fs was compiled with as argument to fuse_loop_mt_317()?


I think adding it to fuse_loop_mt_317 would be easy and could be
auto-added with a macro. I think more difficult is to add it to the old
api/abi. I.e. it would take a few years until that api version gets
mostly used?


Reply to: