[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Architecture variants for Debian / Ubuntu





On Tue, 31 Oct 2023 at 09:21, Guillem Jover <guillem@debian.org> wrote:
Hi!

On Thu, 2023-09-21 at 14:43:42 +1200, Michael Hudson-Doyle wrote:
> Thanks for the considered response. And sorry for the very slow reply.

Idem! :)

> On Wed, 6 Sept 2023 at 21:27, Guillem Jover wrote:
> > I'm not sure I entirely agree with the requirements you set forth
> > though:
> >
> >  - I think such optimized builds might need to be done with "special
> >    toolchains" (these could simply be wrappers over the host compiler
> >    passing the appropriate flags via command-line or via specs or
> >    similar, not necessarily full blown toolchains), passing these via
> >    something like dpkg-buildflags seems currently unreliable, as I don't
> >    think we have full coverage in packages (neither for all compilers
> >    available)? Although it would be better as it would centralize the
> >    management. (For reference this is in part how rpm handles this:
> >     https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
> >
>
> I agree that is not completely clear what the best approach here is, do we
> change the defaults of gcc or influence things via default buildflags.
>
> I'm sure there are packages that do not respect dpkg-buildflags during
> build but the consequences of this do not seem all that great -- such
> packages would not be optimized for the variant / ISA but if someone
> manages to notice this, they can fix the bug.
>
> OTOH, having the compiler default change may be a bit of a surprise for
> people who build binaries for deployment not via Debian packages. (Do our
> compilers in general target the same baseline as Debian does for a given
> architecture?).

Right, given that the failure mode would be just "no-optimized-builds",
and should not end up with those packages being broken, at most just
redundant with the baseline ones, then I guess controlling it either
way would seem fine, yes.

Ack.
 
(Also if the packages are reproducible, and end up being not optimized
this might be detectable as producing identical artifacts as on the
baseline.)

This is an interesting idea -- although of course some care would be required to avoid false positives from things that do not use the C/C++ toolchain at all. Anyway... 
 
> >  - Perhaps that's a limitation from the archive software side, but
> >    requiring to place the binary packages in the same pool seems
> >    rather restrictive (it forces different filenames for example).

> We are considering supporting multiple variant/ISAs in the primary Ubuntu
> archive, so if we get that far then yes, we want to have all the binary
> packages in the same pool. The first steps don't have to support this I
> guess.

Ok. Just a note that even if served from the primary archive, there
could be multiple pools (like the multi-pool setup on debian-ports),
as the entry point are the (In)Release files.

Oh OK. I don't think Launchpad supports that (but an not sure).
 
But, yes, the other
option would be to use the variant/ISA name as a "fake arch" just in
the binary package name.

> >  - I guess it might be nice for the ISA to be passed down to the
> >    dpkg tools, but I don't think this is strictly necessary? A
> >    frontend like apt could also decide based on metadata in say the
> >    Release file, although not having the actual installed package
> >    metadata on whether it was a different ISA build or not would make
> >    its job more inconvenient. In any case I don't have a big issue
> >    with recording this via dpkg-gencontrol or similar if necessary.

> I agree, I don't think it's /strictly/ required that the target ISA is
> recorded in the deb. But I think adding a field for it reduces scope for
> confusion later.

Yes, agreed.

> > On the specific implementation details:
> >
> >  - As covered in previous discussions, dpkg could (but I don't think
> >    it's necessary) check whether the .deb is runnable on the current
> >    hw, but that's tricky as chrootless installs need to be taken
> >    into account, etc. It should certainly not be part of dependency
> >    resolution.

> I'm sorry, what is a chrootless install? But I think I agree here too:
> tricky and just not really worth it.

https://wiki.debian.org/Teams/Dpkg/Spec/InstallBootstrap

Ah right.

This can be used among other things to set up foreign chroots, by
running the host tools, so disallowing installation could be
problematic. Even though I guess there could be a warning about this,
or maybe it could be controlled through a force option, although both
seems like they could be disruptive.

Of course in such cases dpkg knows that something funny is going on and could suppress the warning itself. 

I spent a few minutes trying to think hard about this and I honestly don't think I can predict whether trying to prevent installation of incompatible packages is worth it (after all one of the ways users could get into trouble would be moving an installed system to a different CPU and having binaries start to fail and obviously dpkg can't help there).

One result of this thinking was: I had been thinking/assuming the issue of which variants to consider would be apt configuration, but maybe dpkg configuration would make more sense (after all, --add-architecture is a parameter to dpkg). And in this case, dpkg could object when installing a variant that has not been configured.
 
> >  - I'm not fond of having to change the binary package name format
> >    either for this (name_version_arch.deb) even if at least dpkg
> >    itself does not care (but I know other tools do care), and
> >    depending on the format I'd expect things to break (this goes
> >    back to the shared pool concern).
>
> I don't think this is avoidable in the long run. I must admit I have
> generally thought of the presence of the architecture name in the .deb file
> name to be more a convention than part of the format (and the "real"
> indication of a binary package's architecture is in DEBIAN/control).

Yes and no I guess. In theory the (canonical) information should be
extracted from the DEBIAN/control from inside the .deb, in practice
I think tools (?) (might) try to use heuristics from just the filename
to avoid having to open, uncompress and parse every .deb around, for
performance reasons.

True. In fact it looks like apt-ftparchive does this (when using the --architecture flag at least) so I get to care about this a bit...

If the only change in the package filename format is in the <arch> part
where we'd use a name which would otherwise be valid as an arch name (so,
no weird symbols, or «-» separators that are not intended to split <os>
and <cpu> or similar), then using a name for the variant/ISA would be
fine.

Right. I think that (when possible pretending e.g. "amd64v3" is a distinct architecture will generally make things easier. E.g. I think britney wouldn't need to know about the relationship between "amd64" and "amd64v3". 
 
> >  - If dpkg-architecture needs to be aware of this, then this might need
> >    to be auto-detectable from just the current toolchain being used.

> So you are saying to configure a build environment for, say, x86-64-v3 you
> would configure gcc with --with-arch64=x86-64-v3 and then dpkg-architecture
> would parse the output of gcc -Q --help=target to set DEB_HOST_ARCH_VARIANT
> appropriately? (modulo mistakes in details) Or do you mean something else
> entirely?

That would be one solution yes, which could give automatic bijective
mappings, although ideally with a machine-readable way to get at it,
which I'm not sure we have currently.

I think "gcc -Q --help=target | grep -e '^\s*-march'" is about as machine readable as it gets currently, for better or worse (mostly worse).
 
For example code in dpkg-dev
already runs «$CC -dumpmachine» to infer the host architecture to use
during builds.

While using a triplet variation could be a way to do that, that would
require such triplet support for each variant/ISA, which tends to be
very painful to introduce if it's not there already, so I'd not
consider this specific way a viable option.

I admit I'm not an expert on triplet intricacies but I think a new triplet is not appropriate here (a bit like a new Debian architecture for a variant/ISA choice is not the right concept).
 
> > Some of the above problems could perhaps be avoided if we introduced
> > a concept of architecture aliases/ISAs (similar to what rpm has), which
> > would side-step the pool sharing issue, the binary package renaming,
> > etc. One big issue with this is that it requires for dpkg to have an
> > exhaustive table of all such aliases, and if there's ever a new alias
> > added, old dpkg versions need to be updated or they will not understand
> > what they match with. So this does not seem ideal either. So I guess this
> > is a variation over your proposal, but perhaps this could still be used
> > in specific contexts, say only at build-time (but not for dependency
> > relationships), for repo management (say binary-arm64v9/Packages.xz),
> > or binary package names where the field would specify the actual name
> > for the filename, say:
> >
> >   Architecture: arm64
> >   ArchitectureIsa: arm64v9
> >
> > or maybe better:
> >
> >   Architecture: arm64
> >   ArchitectureIsa: v9
> >
> > resulting in dpkg-deb generating:
> >
> >   binpkg_1.0-1_arm64v9.deb
> >
> > but targeting arm64.

> I'm not sure but I think you have talked yourself into suggesting something
> very similar to my proposal here?

Ah sorry, yeah, didn't mean to present it as a new idea,

:-)
 
I was mostly
trying to walk over the issues, and refine upon your initial idea,
with those constraints applied. :)

I'm certainly glad you got to a similar place as me!
 
> > On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
> > > Is there a better way of doing this?
> >
> > I think starting from 5, the rest are probably just details to hammer
> > out, but not insurmountable things.

> Great. The things I see as a bit vague at a base level currently are:
>
> * Should the ISA influence the toolchain via toolchain defaults or
> dpkg-buildflags?
> * How is the default ISA for a buildd chroot selected?

So the clear downsides of either modifying the default toolchain or
having to provide an additional one is that this seems pretty heavy
weight. Also because people might want to build optimized variants
locally w/o having to mess with their already existing toolchains.
(I'm not sure whether something going along the lines of
<https://git.hadrons.org/cgit/debian/fakecross.git> could be an
option, although as mentioned above, if that would imply new triplets,
then probably not.)

So the easiest way might indeed be by controlling this via an envvar,

DEB_HOST_ARCH_ISA?
 
which dpkg-buildpackage could also setup internally via a new option,
say --arch-isa=amd64v3 or similar

--host-arch-isa would be more coherent I think.

I guess one could add support for --target-host-arch-isa to build a toolchain that defaults to a particular ISA. But well.
 
to make this slightly more
discoverable. Which would be easy to use from the buildds too I guess.

I also think that (conceptually) it makes sense that you might want to have an build chroot that *uses* amd64v3 binaries (because your builder is amd64v3) to *produce* boring old amd64 binaries (I mean, I doubt gcc built with different march is so much faster that it really matters but...)
 
> There is also the question of whether partial coverage of an ISA is handled
> by the package publisher or client side in apt but that's at least one
> level higher.

Yeah, that would be of no concern to dpkg, I think.

Ack.

So to summarise, here are the generic changes that I think need to be made to src:dpkg to support variant ISAs as a thing:

 * add get_host_arch_isa() to Dpkg::Arch
 * dpkg-gencontrol records DEB_HOST_ARCH_ISA into DEBIAN/control as ArchitectureIsa
 * dpkg-architecture emits DEB_HOST_ARCH_ISA and grows --host-arch-isa flag
 * dpkg-buildpackage passes --host-arch-isa to dpkg-architecture
 * dpkg-genchanges should record the ISA in the changes file somehow I guess?
 * dpkg-deb records the ISA in the file name

Have I missed anything? (Hmm does anything need to reject unknown values found in DEB_HOST_ARCH_ISA /  --host-arch-isa? Probably...)

Conceptually slightly separately, it might make sense to add a build "feature" to Dpkg::Vendor::Debian to allow setting -march (and -mtune?)

Then when we want to add support to an ISA, we add a little thing to set_build_features (in either Vendor::Debian or Vendor::Ubuntu or wherever) that maps get_host_arch_isa() to values for the march-influencing feature.

Cheers,
mwh 

Reply to: