[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Extending ar format to support large member sizes



On Wed, Aug 20, 2025 at 11:38 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 21.08.2025 04:50, Guillem Jover wrote:
> > As you probably know the deb 2.x packaging format [D] is based on an
> > ar container, with tar members for its format definition, control
> > metadata and filesystem data payloads (among others).
> >
> > A problem that has been known for a while now [S], is that the ar format
> > has a file size limit, which means the deb format is thus limited to
> > filesystem payloads (usually compressed) of at most around 9536.74 MiB.
> >
> > This is starting to become an issue, and it's something that has been
> > bothering me for a bit, because support for a new deb format should be
> > ready way before we need it, as older tools should ideally be able to
> > handle it. And there are multiple tools involved that will need to be
> > updated [T].
>
> Hasn't there been an extension to cover that for many years, using "!<arch64>\n"
> as file signature? I do not know, however, for well formalized that extension is,
> which solely differs from traditional archives by having a 20-byte size field (in
> place of the 10-byte one).
>
> Jan

Is there an !<arch64>\n extension? I can't find !<arch64>\n in
binutils, libarchive, FreeBSD's elftoolchain, or LLVM.
AIX has a big archive extension that supports a larger size field, but
we likely don't want to use an AIX extension.

The size limit stems from the 10-byte decimal size field in the header
and the symbol table offset.
The /SYM64/ extension supports 64-bit symbol table offsets, and the
10-byte decimal size field in the header could be easily expanded (for
parser, bfd/archive.c:538 ` scan = sscanf (hdr.ar_size, "%" SCNu64,
&parsed_size);` alreads supports larger size IIRC)

(
Regarding /SYM64/:
I'd argue that the archive symbol table is largely unnecessary. mold
and lld's ELF port completely ignore the archive symbol table.
https://maskray.me/blog/2022-01-16-archives-and-start-lib

> With a lower ratio, I have measured 1.01x when linking Clang. With my experience, for projects large enough that the performance matters, the archive utilization ratio is typically high. So I would say the archive symbol table is nearly useless.

)


Reply to: