Trying Debian/armhf rebootstrap with time64
- To: y2038 Mailman List <y2038@lists.linaro.org>
- Cc: GNU C Library <libc-alpha@sourceware.org>, debian-arm@lists.debian.org, tcwg@linaro.org, Helmut Grohne <helmutg@debian.org>, Wookey <wookey@wookware.org>, Adhemerval Zanella <adhemerval.zanella@linaro.org>, Steve McIntyre <steve@einval.com>, Lukasz Majewski <lukma@denx.de>, Jan Kiszka <jan.kiszka@web.de>, Riku Voipio <riku.voipio@iki.fi>
- Subject: Trying Debian/armhf rebootstrap with time64
- From: Arnd Bergmann <arnd@arndb.de>
- Date: Wed, 11 Mar 2020 13:52:00 +0100
- Message-id: <[🔎] CAK8P3a0EtmgDRbDzBhOOZk_kyWmCm1aqvSxwUeY0R7tbCSxaKg@mail.gmail.com>
As discussed before, I tried using the rebootstrap tool [1] to see what
problems come up once the entire distro gets rebuilt. Based on Lukasz'
recommendation, I tried the 'y2038_edge' branch with his experimental
glibc patches [2], using commit c2de7ee9461 dated 2020-02-17.
Here is a rough summary of what I tried, what worked, and what problems
I ran into:
* Building a Debian package from this was fairly straightforward, using
the 2.31 branch in the package git repository[3] after replacing the
debian/patches/git-updates.diff file with one generated from [2] and
disabling the hurd patches because of conflicts.
* After installing the modified x86 glibc package, I ran into a runtime
bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid
ENOTDIR errors.
* Bootstrapping a regular time32 Debian armhf with this libc took me
a few days to get right, but that was mostly for getting familiar
with rebootstrap and running into known issues unrelated to time64
or the glibc changes.
* Actually building a time64 version of glibc turned out to be
harder, including some issues discussed on the libc mailing list[5]:
- Always setting -D_TIME_BITS=64 in the global compiler flags for
the distro breaks both the native 64-bit (x86_64) build and the
32-bit build, as glibc itself expects to be built without this.
- Removing the time32 symbols from the glibc shared object did not
work as they are still used (a lot) internally, and by the testsuite.
- I tried converting all the internal symbols to use the time64
variants with the correct types (e.g. __clock_gettime64() instead
of __clock_gettime()), but then ran into a lot of APIs that take
timespec/timeval/... arguments and pass them down into internal
functions. These seem to all be bugs that require adding a time64
version of the external ABI.
- After I abandoned that approach, I continued with a simple
patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based on
'#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I
found earlier but got me a lot further.
- Building the i386 glibc with that patch, I ran into over 150
testsuite failures [6]. This looked like there was a fundamental
mistake on my side, but after I looked into a few of the failures,
most seemed to be either glibc or testsuite bugs that have to be
addressed individually. I considered giving up at this point,
but as Lukasz has said that he had successfully built a working
system using Yocto, I kept going anyway and marked these all as
expected failures in the debian package.
* There are a couple of noteworthy issues in glibc-y2038 I'd like to
point out in particular, though I'm sure these are not the only
important ones:
- The clock_nanosleep() prototype needed a '__THROW' annotation
to complete the build.
- The nptl and sunrpc portions have numerous interfaces with
'timeval' or 'timespec' arguments that each cause an ABI break.
- stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk()
are some of the other interfaces that take a time_t based
argument and need to grow a time64 version to avoid an ABI mismatch.
- The timeval prototype appears to be broken, as it's missing
padding on architectures without native alignment of __time64
(e.g. i386) and on all big-endian architectures.
- some testcases hang in futex_wait() or clock_nanosleep()
because of incorrect timeout arguments, presumably from type
mismatches.
* There is an open question regarding the name of the Debian
architecture. For my experiments, I kept using the 'armhf' name
unmodified, though there seems to be a general feeling that using a
different name would be required to address the broad incompatibilities
between time32 and time64 versions of all the libraries in the
distro. Gradually changing them won't work because of the timeline and
the number of affected libraries. However, the new name of the distro
also implies having a distinct target triplet, which must then be known
by glibc along with everything else using config.guess/config.sub. I
expect this topic to require a lot more discussion.
* Continuing with the rebootstrap build despite the known glibc issues
and the open question on the architecture name went surprisingly
well, only two out of the 152 source packages I built had
compile-time problems:
- building the final gcc failed in libsanitizer, which has
compile-time checks to ensure some libc data structures have the
expected layout. It noticed that 'struct timeb' and 'struct dirent'
are different based on _TIME_BITS and _FILE_OFFSET_BITS. I disabled
the checks to be able to continue. To this properly, the library
has to learn about the new data structures as well. I opened a
bug report against the library[7].
- libpreludecpp12 failed to build because of checks for changes
in the exported functions, which are different with time64.
I disabled the checks. Once we have agreed on a new debian
architecture name, the symbols can be made arch specific.
* After everything was built, I tried installing the packages into
a chroot with qemu-debootstrap, which failed because I had
configured the glibc to assume it's running on a new kernel
while the qemu-user binary I had lacks the new syscalls.
I believe this is fixed in upstream qemu, but did not try that.
* Trying to install again I used a clean debian-arm64 installation
running in qemu-system-aarch64, and attempted installing the
armhf packages using a regular debootstrap, running the 32-bit
binaries in compat mode of a recent arm64 kernel. This partially
worked and I could chroot into the system and use a shell, but
ultimately the debootstrap did not complete because of errors.
I saw that 'tar' had failed because of the stat() glibc ABI mismatch
breaking its private gnulib fdutimens() implementation, and this is
where I gave up.
I have spent more time on this now than I had planned, and don't expect
to do further work on it anytime soon, but I hope my summary is useful
to others that are going to need this later. I can obviously share
my patches and build artifacts if anyone needs them. There are two
additional approaches that would likely get a Debian bootstrap further,
but that I have not tried as they were previously dismissed:
* Adding a time64 armhf as a separate (incompatible) target in glibc
that defines __TIMESIZE==64 and a 64-bit __time_t would avoid
most of the remaining ABI issues and put armhf-time64 in the same
category as riscv32 and arc, but this idea was so far rejected by the
glibc maintainers. Depending on how hard this turns out to be,
it could be used to get to the point of self-hosting though, and
help find time64 related bugs in the rest of the distro.
* Doing the bootstrap using a musleabihf target instead of gnueabihf
would avoid the current issues internal to glibc-y2038, but instead
lead to new problems with packages that do not currently work with
musl. Adelie Linux has shown that it's already possible to build
a useful distro using musl and time64[8], and this would
sidestep the question of the target triplet. While it would also
help find and fix additional bugs in packages, and make an
interesting unoffical Debian target, I don't see it replacing
the existing armhf port any time soon.
For additional information about the Debian plans, see the
article on LWN[9] that summarizes the discussion started by
Steve McIntyre [10].
Arnd
[1] https://wiki.debian.org/HelmutGrohne/rebootstrap
[2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge
[3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31
[4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee
[5] https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html
[6] https://pastebin.com/fJYV2stF
[7] https://bugs.llvm.org/show_bug.cgi?id=45138
[8] https://wiki.adelielinux.org/wiki/Project:Time64
[9] https://lwn.net/Articles/812767/
[10] https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/
Reply to: