[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Extending ar format to support large member sizes



On Tue, Sep 2, 2025 at 4:37 AM Guillem Jover <guillem@debian.org> wrote:
>
> Hi!
>
> On Thu, 2025-08-21 at 01:01:32 -0700, Fangrui Song wrote:
> > On Wed, Aug 20, 2025 at 11:38 PM Jan Beulich <jbeulich@suse.com> wrote:
> > > Hasn't there been an extension to cover that for many years, using "!<arch64>\n"
> > > as file signature? I do not know, however, for well formalized that extension is,
> > > which solely differs from traditional archives by having a 20-byte size field (in
> > > place of the 10-byte one).
>
> If this variant of the format only covers the length (although that's
> pretty much what would be needed for .deb support), that seems a bit
> limiting given that at least the uid/gid and potentially the mode
> might not be big enough either.
>
> I think if this was to be considered (but where I'm tending to think
> this is really not my preferred path forward, see below) then something
> like this struct…
>
>   ```
>   #define AR64MAG "!<arch64>\n"
>   #define SAR64MAG 10
>
>   struct ar64_hdr {
>     char ar_name[16];   /* Member file name, may be '/'-terminated. */
>     char ar_time[12];   /* File seconds, ASCII decimal since Epoch. */
>     char ar_uid[10];    /* User ID, in ASCII decimal.  */
>     char ar_gid[10];    /* Group ID, in ASCII decimal.  */
>     char ar_mode[10];   /* File mode, in ASCII octal.  */
>     char ar_size[20];   /* File size, in ASCII decimal.  */
>     char ar_fmag[2];    /* File magic terminator. */
>   };
>   ```
>
> …might be better, but if that is not even going to be potentially
> compatible with a pre-existing format, then it might not be worth it?
> (Also going from the original 60 bytes, to this new 80 bytes seems
> like a nice round bump. :)
> > Is there an !<arch64>\n extension? I can't find !<arch64>\n in
> > binutils, libarchive, FreeBSD's elftoolchain, or LLVM.
> > AIX has a big archive extension that supports a larger size field, but
> > we likely don't want to use an AIX extension.
>
> I also tried a search on codesearch.debian.net and also on DuckDuckGo,
> Google and github.com, but nothing relevant seems to pop up. Checked
> file(1) and it didn't have any knowledge of that format either.
>
> > The /SYM64/ extension supports 64-bit symbol table offsets, and the
> > 10-byte decimal size field in the header could be easily expanded (for
> > parser, bfd/archive.c:538 ` scan = sscanf (hdr.ar_size, "%" SCNu64,
> > &parsed_size);` alreads supports larger size IIRC)
>
> I don't think this can currently handle anything larger than the current
> 10-byte decimal size though (~ 9536 MiB), as the sscanf ends up using
> something like "%llu" or similar? (But maybe I misunderstood your
> parenthetical comment.)

My point is that the reader is likely already compatible with 64-bit
size as it uses SCNu64.
We just need to allow 64-bit size for the writer. Of course large
archives can only be read by newer archive readers.

ar_date/ar_uid/ar_gid are not very useful nowadays, as we prefer build
determinism.

> On Thu, 2025-08-21 at 10:41:23 +0200, Jan Beulich wrote:
> > On 21.08.2025 10:01, Fangrui Song wrote:
> > > Is there an !<arch64>\n extension?
> >
> > 15 or more years ago, when I came across this, I didn't write down its
> > origin. It may be a Windows world extension.
>
> It would be nice to know though, otherwise we might be breaking an
> existing format variant, if we ended up wanting to go into that
> direction.
>
> > > I can't find !<arch64>\n in
> > > binutils, libarchive, FreeBSD's elftoolchain, or LLVM.
> >
> > Right, that's what may need adding there. Or whatever else extension we
> > may want to use.
>
> I've been pondering about the base-256 extension vs the "!<arch64>"
> format, and I think I'm leaning towards the base-256 extension,
> because although the field parsing might be slightly more complex (but
> not too much really), it ends up being overall a way less intrusive
> modification to existing code bases, where you only need to hook into
> whatever is parsing the field, and do not need to touch much else.
> In contrast adding a new "!<arch64>" variant might imply new entire
> parsing functions, or refactoring them to support the different struct
> sizes, and also the detection of the new magic value and its length.
> It would also imply that things like file(1) would be completely
> unaware of this new format.
>
> For the base-256 extension I've implemented extraction support already
> in dpkg-deb (need creation support and testing whether it works,
> although it's based on its existing tar base-256 support :).
>
> (See for example:
> <https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?h=next/libdpkg-ar-large-meta-base256>)
>
> Thanks,
> Guillem


Reply to: