[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Libfuse interoperability/ABI broken.



Hi all,

this is certainly not kind of the mail I was hoping for as a new libfuse
maintainer.

As you can see from the title and from discussion below (sorry this is 
not typical ML discussion style), we have a bit of of problem with 
libfuse ABI compatibility. 

While scanning through git history, Ashley found a commit that was adding
members in the middle of struct - doesn't break API, but binary
compatibility. That commit already landed a year ago in these releases:

git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
fuse-3.14.1
fuse-3.15.0
fuse-3.15.1
fuse-3.16.1
fuse-3.16.2


Obviously this needs improved testing for, but right now we wonder what
would be the preferred action to avoid issues.

a) We could fix it and move bits to the right place. Fixes everything
compiled with old versions, but breaks everything compiled with the
versions above

b) Increase the .so version - enforces recompilation of all binaries.
Intrusive, especially for distributions, but probably safest.

c) Other ideas?



I don't think there is anything in libfuse that would allow us to
detect which version of libfuse a library was linked to.


The commit shifted these members in struct fuse_file_info {

struct fuse_file_info {
...
	/** Can be filled by open/create, to allow parallel direct writes on this
         *  file */
        unsigned int parallel_direct_writes : 1; --> introduced the shift

	/** Indicates a flush operation.  Set in flush operation, also
	    maybe set in highlevel lock operation and lowlevel release
	    operation. */
	unsigned int flush : 1;

	/** Can be filled in by open, to indicate that the file is not
	    seekable. */
	unsigned int nonseekable : 1;

	/* Indicates that flock locks for this file should be
	   released.  If set, lock_owner shall contain a valid value.
	   May only be set in ->release(). */
	unsigned int flock_release : 1;

	/** Can be filled in by opendir. It signals the kernel to
	    enable caching of entries returned by readdir().  Has no
	    effect when set in other contexts (in particular it does
	    nothing when set by open()). */
	unsigned int cache_readdir : 1;

	/** Can be filled in by open, to indicate that flush is not needed
	    on close. */
	unsigned int noflush : 1;
};

I.e. setting flush would actually set parallel_direct_writes
(note: with binaries compiled against older libfuse versions)


For the high level API it is probably less critical, in struct fuse_config
these fields are shifted:

struct fuse_config {
...
	/**
	 *  Allow parallel direct-io writes to operate on the same file.
	 *
	 *  FUSE implementations which do not handle parallel writes on
	 *  same file/region should NOT enable this option at all as it
	 *  might lead to data inconsistencies.
	 *
	 *  For the FUSE implementations which have their own mechanism
	 *  of cache/data integrity are beneficiaries of this setting as
	 *  it now open doors to parallel writes on the same file (without
	 *  enabling this setting, all direct writes on the same file are
	 *  serialized, resulting in huge data bandwidth loss).
	 */
	int parallel_direct_writes;

	/**
	 * The remaining options are used by libfuse internally and
	 * should not be touched.
	 */
	int show_help;
	char *modules;
	int debug;
};


I don't think there is a security concern, but probably more a
data safety issue. So I included open mailing lists.


Thanks,
Bernd


On 3/7/24 19:02, Ashley Pittman wrote:
> 
> Simply bumping the .so number and forcing a rebuild would certainly work.  It would probably be the safest option but also put the highest cost on users, although I think for this kind of bug then a manual step of verifying the versions are correct is needed so forcing a build failure isn’t a bad option.  It would massively increase the visibility if this if nothing else.
> 
> Perhaps we should bring in the distro package maintainers here for their guidance/input?  Like I say I’ve not had experience of this type of issue before but I’m sure the distros will have.
> 
> Ashley.
> 
>> On 7 Mar 2024, at 16:05, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Ashley,
>>
>> thanks for spotting and embarrassing as I had been involved in
>> development that feature.
>>
>> Hard to decide if revert it right - issue is, if we revert, everything
>> linked with the new version would broken as well. And it is now already
>> a year...
>>
>> I think we can only increase the .so version and send out a warning
>> (mailing list, distributions).
>>
>> In order to avoid it in the future, we can probably add some compile
>> time asserts about struct member positions.
>>
>>
>> Bernd
>>
>> On 3/7/24 16:43, Ashley Pittman wrote:
>>>
>>> I’ve spotted an issue with the linked commit, the fuse_file_info struct should have been modified by adding new entries just before the padding, with this commit then members after the new entry will be moved creating a change in the ABI for members after this.
>>>
>>> https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>>>
>>> This affects the flush, nonseekable, flock_release, cache_readdir and noflush flags, each one of which could be set or cleared accidentally with one of the flags before or after it depending on what version of libfuse the application is compiled and linked with.
>>>
>>> This isn’t a failure mode that I’ve experience before when using linux so I don’t have a playbook to work from in how to correct this but essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 to 3.16 have a different ABI.
>>>
>>> I think the best course of action would be to revert to the previous ABI, move the entry to the correct place in the structure, make a new 3.17 release and then publicise that releases 3.13-3.16 should not be used.  Of course this fix is part technical (we can simply change the code) however there’s no obvious way of enforcing or detecting this incompatibility at run-time so a fuse-users announcement will need to be made, and probably check with maintainers for various distros etc to see if they’re on 3.13 or later already, essentially part of the resolution is going to be a PR exercise to retire broken releases.
>>>
>>> One thing we could and perhaps should do is add a extra placeholder to the fuse_lowlevel_ops struct, this would allow the library at runtime to compare the size passed to fuse_session_new() to at least detect if a potentially incompatible set of headers had been used, this is far from perfect though as the last change to this structure was five years ago so the closest we could work out is if a binary was compiled using versions 3.8 to 3.16 inclusive, a start but not ideal.  Of course if the size matches or doesn’t include the lseek entry then there the API change would not be an issue.
>>>
>>>
>>> Thoughts? Looking at the flags affected then there are several cases for surprising or unexpected behaviour and intended caching or data loss so I think this is going to be a big issue, at least theoretically if not in practice and we do need to take this seriously.
>>>
>>> Ashley.
> 


Reply to: