[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Extending ar format to support large member sizes



Hi!

On Thu, 2025-08-21 at 01:01:32 -0700, Fangrui Song wrote:
> On Wed, Aug 20, 2025 at 11:38 PM Jan Beulich <jbeulich@suse.com> wrote:
> > Hasn't there been an extension to cover that for many years, using "!<arch64>\n"
> > as file signature? I do not know, however, for well formalized that extension is,
> > which solely differs from traditional archives by having a 20-byte size field (in
> > place of the 10-byte one).

If this variant of the format only covers the length (although that's
pretty much what would be needed for .deb support), that seems a bit
limiting given that at least the uid/gid and potentially the mode
might not be big enough either.

I think if this was to be considered (but where I'm tending to think
this is really not my preferred path forward, see below) then something
like this struct…

  ```
  #define AR64MAG "!<arch64>\n"
  #define SAR64MAG 10

  struct ar64_hdr {
    char ar_name[16];   /* Member file name, may be '/'-terminated. */
    char ar_time[12];   /* File seconds, ASCII decimal since Epoch. */
    char ar_uid[10];    /* User ID, in ASCII decimal.  */
    char ar_gid[10];    /* Group ID, in ASCII decimal.  */
    char ar_mode[10];   /* File mode, in ASCII octal.  */
    char ar_size[20];   /* File size, in ASCII decimal.  */
    char ar_fmag[2];    /* File magic terminator. */
  };
  ```

…might be better, but if that is not even going to be potentially
compatible with a pre-existing format, then it might not be worth it?
(Also going from the original 60 bytes, to this new 80 bytes seems
like a nice round bump. :)

> Is there an !<arch64>\n extension? I can't find !<arch64>\n in
> binutils, libarchive, FreeBSD's elftoolchain, or LLVM.
> AIX has a big archive extension that supports a larger size field, but
> we likely don't want to use an AIX extension.

I also tried a search on codesearch.debian.net and also on DuckDuckGo,
Google and github.com, but nothing relevant seems to pop up. Checked
file(1) and it didn't have any knowledge of that format either.

> The /SYM64/ extension supports 64-bit symbol table offsets, and the
> 10-byte decimal size field in the header could be easily expanded (for
> parser, bfd/archive.c:538 ` scan = sscanf (hdr.ar_size, "%" SCNu64,
> &parsed_size);` alreads supports larger size IIRC)

I don't think this can currently handle anything larger than the current
10-byte decimal size though (~ 9536 MiB), as the sscanf ends up using
something like "%llu" or similar? (But maybe I misunderstood your
parenthetical comment.)

On Thu, 2025-08-21 at 10:41:23 +0200, Jan Beulich wrote:
> On 21.08.2025 10:01, Fangrui Song wrote:
> > Is there an !<arch64>\n extension?
> 
> 15 or more years ago, when I came across this, I didn't write down its
> origin. It may be a Windows world extension.

It would be nice to know though, otherwise we might be breaking an
existing format variant, if we ended up wanting to go into that
direction.

> > I can't find !<arch64>\n in
> > binutils, libarchive, FreeBSD's elftoolchain, or LLVM.
> 
> Right, that's what may need adding there. Or whatever else extension we
> may want to use.

I've been pondering about the base-256 extension vs the "!<arch64>"
format, and I think I'm leaning towards the base-256 extension,
because although the field parsing might be slightly more complex (but
not too much really), it ends up being overall a way less intrusive
modification to existing code bases, where you only need to hook into
whatever is parsing the field, and do not need to touch much else.
In contrast adding a new "!<arch64>" variant might imply new entire
parsing functions, or refactoring them to support the different struct
sizes, and also the detection of the new magic value and its length.
It would also imply that things like file(1) would be completely
unaware of this new format.

For the base-256 extension I've implemented extraction support already
in dpkg-deb (need creation support and testing whether it works,
although it's based on its existing tar base-256 support :).

(See for example:
<https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?h=next/libdpkg-ar-large-meta-base256>)

Thanks,
Guillem


Reply to: