Hi Arnd, Thank you for the summary. > As discussed before, I tried using the rebootstrap tool [1] to see > what problems come up once the entire distro gets rebuilt. Based on > Lukasz' recommendation, I tried the 'y2038_edge' branch with his > experimental glibc patches [2], using commit c2de7ee9461 dated > 2020-02-17. > > Here is a rough summary of what I tried, what worked, and what > problems I ran into: > > * Building a Debian package from this was fairly straightforward, > using the 2.31 branch in the package git repository[3] after > replacing the debian/patches/git-updates.diff file with one generated > from [2] and disabling the hurd patches because of conflicts. > > * After installing the modified x86 glibc package, I ran into a > runtime bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid > ENOTDIR errors. This issue is already fixed in glibc -master by Florian: https://sourceware.org/git/?p=glibc.git;a=commit;h=c10826a3277aa7fc0040c0fa18e60cafbab26edf > > * Bootstrapping a regular time32 Debian armhf with this libc took me > a few days to get right, but that was mostly for getting familiar > with rebootstrap and running into known issues unrelated to time64 > or the glibc changes. > > * Actually building a time64 version of glibc turned out to be > harder, including some issues discussed on the libc mailing list[5]: > > - Always setting -D_TIME_BITS=64 in the global compiler flags for > the distro breaks both the native 64-bit (x86_64) build and the > 32-bit build, as glibc itself expects to be built without this. Yes, correct - one needs to disable this flag for glibc. > > - Removing the time32 symbols from the glibc shared object did not > work as they are still used (a lot) internally, and by the > testsuite. Replacing the internal calls with 64 bit supporting ones is w work in progress. For example one would need to replace __clock_settime with __clock_settime64. > > - I tried converting all the internal symbols to use the time64 > variants with the correct types (e.g. __clock_gettime64() instead > of __clock_gettime()), but then ran into a lot of APIs that take > timespec/timeval/... arguments and pass them down into internal > functions. These seem to all be bugs that require adding a time64 > version of the external ABI. > > - After I abandoned that approach, I continued with a simple > patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based > on '#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I > found earlier but got me a lot further. > > - Building the i386 glibc with that patch, I ran into over 150 > testsuite failures [6]. This looked like there was a fundamental > mistake on my side, but after I looked into a few of the failures, > most seemed to be either glibc or testsuite bugs that have to be > addressed individually. I considered giving up at this point, > but as Lukasz has said that he had successfully built a working > system using Yocto, I kept going anyway and marked these all as > expected failures in the debian package. > > * There are a couple of noteworthy issues in glibc-y2038 I'd like to > point out in particular, though I'm sure these are not the only > important ones: > > - The clock_nanosleep() prototype needed a '__THROW' annotation > to complete the build. Ok. This might been overlooked. > > - The nptl and sunrpc portions have numerous interfaces with > 'timeval' or 'timespec' arguments that each cause an ABI break. The ntpl and sunrpc haven't been yet converted. > > - stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk() > are some of the other interfaces that take a time_t based > argument and need to grow a time64 version to avoid an ABI > mismatch. The stat() and friends will use statx internally, which supports 64 bit time from the outset. Unfortunately, it hasn't been yet converted. As statx was added in 4.1 (IIRC) - after the minimal supported Linux kernel version is bumped to this version (from 3.2 as now) it all will be fixed. > > - The timeval prototype appears to be broken, as it's missing > padding on architectures without native alignment of __time64 > (e.g. i386) and on all big-endian architectures. > You mean the one "exported" to the system or one, which is internal to glibc (from ./include/time.h)? > - some testcases hang in futex_wait() or clock_nanosleep() > because of incorrect timeout arguments, presumably from type > mismatches. Ok. > > * There is an open question regarding the name of the Debian > architecture. For my experiments, I kept using the 'armhf' name > unmodified, though there seems to be a general feeling that using a > different name would be required to address the broad > incompatibilities between time32 and time64 versions of all the > libraries in the distro. Gradually changing them won't work because > of the timeline and the number of affected libraries. However, the > new name of the distro also implies having a distinct target triplet, > which must then be known by glibc along with everything else using > config.guess/config.sub. I expect this topic to require a lot more > discussion. > > * Continuing with the rebootstrap build despite the known glibc issues > and the open question on the architecture name went surprisingly > well, only two out of the 152 source packages I built had > compile-time problems: > Nice to hear that. > - building the final gcc failed in libsanitizer, which has > compile-time checks to ensure some libc data structures have the > expected layout. It noticed that 'struct timeb' and 'struct > dirent' are different based on _TIME_BITS and _FILE_OFFSET_BITS. I > disabled the checks to be able to continue. To this properly, the > library has to learn about the new data structures as well. I opened a > bug report against the library[7]. > > - libpreludecpp12 failed to build because of checks for changes > in the exported functions, which are different with time64. > I disabled the checks. Once we have agreed on a new debian > architecture name, the symbols can be made arch specific. > > * After everything was built, I tried installing the packages into > a chroot with qemu-debootstrap, which failed because I had > configured the glibc to assume it's running on a new kernel > while the qemu-user binary I had lacks the new syscalls. > I believe this is fixed in upstream qemu, but did not try that. > > * Trying to install again I used a clean debian-arm64 installation > running in qemu-system-aarch64, and attempted installing the > armhf packages using a regular debootstrap, running the 32-bit > binaries in compat mode of a recent arm64 kernel. This partially > worked and I could chroot into the system and use a shell, but > ultimately the debootstrap did not complete because of errors. > I saw that 'tar' had failed because of the stat() glibc ABI mismatch > breaking its private gnulib fdutimens() implementation, and this is > where I gave up. > > I have spent more time on this now than I had planned, and don't > expect to do further work on it anytime soon, but I hope my summary > is useful to others that are going to need this later. I can > obviously share my patches and build artifacts if anyone needs them. Could you upload them to any server? (kernel.org or github)? > There are two additional approaches that would likely get a Debian > bootstrap further, but that I have not tried as they were previously > dismissed: > > * Adding a time64 armhf as a separate (incompatible) target in glibc > that defines __TIMESIZE==64 and a 64-bit __time_t would avoid > most of the remaining ABI issues and put armhf-time64 in the same > category as riscv32 and arc, but this idea was so far rejected by > the glibc maintainers. As fair as I know riscv32 and arc will use generic syscall interface. The arm32 bit doesn't support it - so the code from those two aforementioned ports will not be used. > Depending on how hard this turns out to be, > it could be used to get to the point of self-hosting though, and > help find time64 related bugs in the rest of the distro. > > * Doing the bootstrap using a musleabihf target instead of gnueabihf > would avoid the current issues internal to glibc-y2038, but instead > lead to new problems with packages that do not currently work with > musl. Adelie Linux has shown that it's already possible to build > a useful distro using musl and time64[8], and this would > sidestep the question of the target triplet. While it would also > help find and fix additional bugs in packages, and make an > interesting unoffical Debian target, I don't see it replacing > the existing armhf port any time soon. > > For additional information about the Debian plans, see the > article on LWN[9] that summarizes the discussion started by > Steve McIntyre [10]. > > Arnd > > [1] https://wiki.debian.org/HelmutGrohne/rebootstrap > [2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge > [3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31 > [4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee > [5] > https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html > [6] https://pastebin.com/fJYV2stF [7] > https://bugs.llvm.org/show_bug.cgi?id=45138 [8] > https://wiki.adelielinux.org/wiki/Project:Time64 [9] > https://lwn.net/Articles/812767/ [10] > https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/ Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Attachment:
pgpjgvAurSMvT.pgp
Description: OpenPGP digital signature