[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Trying Debian/armhf rebootstrap with time64



As discussed before, I tried using the rebootstrap tool [1] to see what
problems come up once the entire distro gets rebuilt.  Based on Lukasz'
recommendation, I tried the 'y2038_edge' branch with his experimental
glibc  patches [2], using commit c2de7ee9461 dated 2020-02-17.

Here is a rough summary of what I tried, what worked, and what problems
I ran into:

* Building a Debian package from this was fairly straightforward, using
  the 2.31 branch in the package git repository[3] after replacing the
  debian/patches/git-updates.diff file with one generated from [2] and
  disabling the hurd patches because of conflicts.

* After installing the modified x86 glibc package, I ran into a runtime
  bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid
  ENOTDIR errors.

* Bootstrapping a regular time32 Debian armhf with this libc took me
  a few days to get right, but that was mostly for getting familiar
  with rebootstrap and running into known issues unrelated to time64
  or the glibc changes.

* Actually building a time64 version of glibc turned out to be
  harder, including some issues discussed on the libc mailing list[5]:

  - Always setting -D_TIME_BITS=64 in the global compiler flags for
    the distro breaks both the native 64-bit (x86_64) build and the
    32-bit build, as glibc itself expects to be built without this.

  - Removing the time32 symbols from the glibc shared object did not
    work as they are still used (a lot) internally, and by the testsuite.

  - I tried converting all the internal symbols to use the time64
    variants with the correct types (e.g. __clock_gettime64() instead
    of __clock_gettime()), but then ran into a lot of APIs that take
    timespec/timeval/... arguments and pass them down into internal
    functions. These seem to all be bugs that require adding a time64
    version of the external ABI.

  - After I abandoned that approach, I continued with a simple
    patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based on
    '#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I
    found earlier but got me a lot further.

  - Building the i386 glibc with that patch, I ran into over 150
    testsuite failures [6]. This looked like there was a fundamental
    mistake on my side, but after I looked into a few of the failures,
    most seemed to be either glibc or testsuite bugs that have to be
    addressed individually. I considered giving up at this point,
    but as Lukasz has said that he had successfully built a working
    system using Yocto, I kept going anyway and marked these all as
    expected failures in the debian package.

* There are a couple of noteworthy issues in glibc-y2038 I'd like to
  point out in particular, though I'm sure these are not the only
  important ones:

  - The clock_nanosleep() prototype needed a '__THROW' annotation
    to complete the build.

  - The nptl and sunrpc portions have numerous interfaces with
    'timeval' or 'timespec' arguments that each cause an ABI break.

  - stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk()
    are some of the other interfaces that take a time_t based
    argument and need to grow a time64 version to avoid an ABI mismatch.

  - The timeval prototype appears to be broken, as it's missing
    padding on architectures without native alignment of __time64
    (e.g. i386) and on all big-endian architectures.

  - some testcases hang in futex_wait() or clock_nanosleep()
    because of incorrect timeout arguments, presumably from type
    mismatches.

* There is an open question regarding the name of the Debian
  architecture. For my experiments, I kept using the 'armhf' name
  unmodified, though there seems to be a general feeling that using a
  different name would be required to address the broad incompatibilities
  between time32 and time64 versions of all the libraries in the
  distro. Gradually changing them won't work because of the timeline and
  the number of affected libraries. However, the new name of the distro
  also implies having a distinct target triplet, which must then be known
  by glibc along with everything else using config.guess/config.sub. I
  expect this topic to require a lot more discussion.

* Continuing with the rebootstrap build despite the known glibc issues
  and the open question on the architecture name went surprisingly
  well, only two out of the 152 source packages I built had
  compile-time problems:

  - building the final gcc failed in libsanitizer, which has
    compile-time checks to ensure some libc data structures have the
    expected layout. It noticed that 'struct timeb' and 'struct dirent'
    are different based on _TIME_BITS and _FILE_OFFSET_BITS. I disabled
    the checks to be able to continue. To this properly, the library
    has to learn about the new data structures as well. I opened a
    bug report against the library[7].

  - libpreludecpp12 failed to build because of checks for changes
    in the exported functions, which are different with time64.
    I disabled the checks. Once we have agreed on a new debian
    architecture name, the symbols can be made arch specific.

* After everything was built, I tried installing the packages into
  a chroot with qemu-debootstrap, which failed because I had
  configured the glibc to assume it's running on a new kernel
  while the qemu-user binary I had lacks the new syscalls.
  I believe this is fixed in upstream qemu, but did not try that.

* Trying to install again I used a clean debian-arm64 installation
  running in qemu-system-aarch64, and attempted installing the
  armhf packages using a regular debootstrap, running the 32-bit
  binaries in compat mode of a recent arm64 kernel. This partially
  worked and I could chroot into the system and use a shell, but
  ultimately the debootstrap did not complete because of errors.
  I saw that 'tar' had failed because of the stat() glibc ABI mismatch
  breaking its private gnulib fdutimens() implementation, and this is
  where I gave up.

I have spent more time on this now than I had planned, and don't expect
to do further work on it anytime soon, but I hope my summary is useful
to others that are going to need this later.  I can obviously share
my patches and build artifacts if anyone needs them. There are two
additional approaches that would likely get a Debian bootstrap further,
but that I have not tried as they were previously dismissed:

* Adding a time64 armhf as a separate (incompatible) target in glibc
  that defines __TIMESIZE==64 and a 64-bit __time_t would avoid
  most of the remaining ABI issues and put armhf-time64 in the same
  category as riscv32 and arc, but this idea was so far rejected by the
  glibc maintainers. Depending on how hard this turns out to be,
  it could be used to get to the point of self-hosting though, and
  help find time64 related bugs in the rest of the distro.

* Doing the bootstrap using a musleabihf target instead of gnueabihf
  would avoid the current issues internal to glibc-y2038, but instead
  lead to new problems with packages that do not currently work with
  musl. Adelie Linux has shown that it's already possible to build
  a useful distro using musl and time64[8], and this would
  sidestep the question of the target triplet. While it would also
  help find and fix additional bugs in packages, and make an
  interesting unoffical Debian target, I don't see it replacing
  the existing armhf port any time soon.

For additional information about the Debian plans, see the
article on LWN[9] that summarizes the discussion started by
Steve McIntyre [10].

      Arnd

[1] https://wiki.debian.org/HelmutGrohne/rebootstrap
[2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge
[3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31
[4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee
[5] https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html
[6] https://pastebin.com/fJYV2stF
[7] https://bugs.llvm.org/show_bug.cgi?id=45138
[8] https://wiki.adelielinux.org/wiki/Project:Time64
[9] https://lwn.net/Articles/812767/
[10] https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/


Reply to: