[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Trying Debian/armhf rebootstrap with time64



Hi Arnd,

Thank you for the summary.

> As discussed before, I tried using the rebootstrap tool [1] to see
> what problems come up once the entire distro gets rebuilt.  Based on
> Lukasz' recommendation, I tried the 'y2038_edge' branch with his
> experimental glibc  patches [2], using commit c2de7ee9461 dated
> 2020-02-17.
> 
> Here is a rough summary of what I tried, what worked, and what
> problems I ran into:
> 
> * Building a Debian package from this was fairly straightforward,
> using the 2.31 branch in the package git repository[3] after
> replacing the debian/patches/git-updates.diff file with one generated
> from [2] and disabling the hurd patches because of conflicts.
> 
> * After installing the modified x86 glibc package, I ran into a
> runtime bug in [4], which needs to pass AT_FDCWD instead of 0 to avoid
>   ENOTDIR errors.

This issue is already fixed in glibc -master by Florian:
https://sourceware.org/git/?p=glibc.git;a=commit;h=c10826a3277aa7fc0040c0fa18e60cafbab26edf

> 
> * Bootstrapping a regular time32 Debian armhf with this libc took me
>   a few days to get right, but that was mostly for getting familiar
>   with rebootstrap and running into known issues unrelated to time64
>   or the glibc changes.
> 
> * Actually building a time64 version of glibc turned out to be
>   harder, including some issues discussed on the libc mailing list[5]:
> 
>   - Always setting -D_TIME_BITS=64 in the global compiler flags for
>     the distro breaks both the native 64-bit (x86_64) build and the
>     32-bit build, as glibc itself expects to be built without this.

Yes, correct - one needs to disable this flag for glibc.

> 
>   - Removing the time32 symbols from the glibc shared object did not
>     work as they are still used (a lot) internally, and by the
> testsuite.

Replacing the internal calls with 64 bit supporting ones is w work in
progress. For example one would need to replace __clock_settime with
__clock_settime64.

> 
>   - I tried converting all the internal symbols to use the time64
>     variants with the correct types (e.g. __clock_gettime64() instead
>     of __clock_gettime()), but then ran into a lot of APIs that take
>     timespec/timeval/... arguments and pass them down into internal
>     functions. These seem to all be bugs that require adding a time64
>     version of the external ABI.
> 
>   - After I abandoned that approach, I continued with a simple
>     patch to features.h that sets _TIME_BITS/_FILE_OFFSET_BITS based
> on '#if !defined _LIBC && __TIMESIZE == 32', which ignores the bugs I
>     found earlier but got me a lot further.
> 
>   - Building the i386 glibc with that patch, I ran into over 150
>     testsuite failures [6]. This looked like there was a fundamental
>     mistake on my side, but after I looked into a few of the failures,
>     most seemed to be either glibc or testsuite bugs that have to be
>     addressed individually. I considered giving up at this point,
>     but as Lukasz has said that he had successfully built a working
>     system using Yocto, I kept going anyway and marked these all as
>     expected failures in the debian package.
> 
> * There are a couple of noteworthy issues in glibc-y2038 I'd like to
>   point out in particular, though I'm sure these are not the only
>   important ones:
> 
>   - The clock_nanosleep() prototype needed a '__THROW' annotation
>     to complete the build.

Ok. This might been overlooked.

> 
>   - The nptl and sunrpc portions have numerous interfaces with
>     'timeval' or 'timespec' arguments that each cause an ABI break.

The ntpl and sunrpc haven't been yet converted.

> 
>   - stat()/fstat()/lstat(), nanosleep(), wait3()/wait4(), ppoll_chk()
>     are some of the other interfaces that take a time_t based
>     argument and need to grow a time64 version to avoid an ABI
> mismatch.

The stat() and friends will use statx internally, which supports 64 bit
time from the outset.
Unfortunately, it hasn't been yet converted.

As statx was added in 4.1 (IIRC) - after the minimal supported Linux
kernel version is bumped to this version (from 3.2 as now) it all will
be fixed.

> 
>   - The timeval prototype appears to be broken, as it's missing
>     padding on architectures without native alignment of __time64
>     (e.g. i386) and on all big-endian architectures.
> 

You mean the one "exported" to the system or one, which is internal to
glibc (from ./include/time.h)?

>   - some testcases hang in futex_wait() or clock_nanosleep()
>     because of incorrect timeout arguments, presumably from type
>     mismatches.

Ok.

> 
> * There is an open question regarding the name of the Debian
>   architecture. For my experiments, I kept using the 'armhf' name
>   unmodified, though there seems to be a general feeling that using a
>   different name would be required to address the broad
> incompatibilities between time32 and time64 versions of all the
> libraries in the distro. Gradually changing them won't work because
> of the timeline and the number of affected libraries. However, the
> new name of the distro also implies having a distinct target triplet,
> which must then be known by glibc along with everything else using
> config.guess/config.sub. I expect this topic to require a lot more
> discussion.
> 
> * Continuing with the rebootstrap build despite the known glibc issues
>   and the open question on the architecture name went surprisingly
>   well, only two out of the 152 source packages I built had
>   compile-time problems:
> 

Nice to hear that.

>   - building the final gcc failed in libsanitizer, which has
>     compile-time checks to ensure some libc data structures have the
>     expected layout. It noticed that 'struct timeb' and 'struct
> dirent' are different based on _TIME_BITS and _FILE_OFFSET_BITS. I
> disabled the checks to be able to continue. To this properly, the
> library has to learn about the new data structures as well. I opened a
>     bug report against the library[7].
> 
>   - libpreludecpp12 failed to build because of checks for changes
>     in the exported functions, which are different with time64.
>     I disabled the checks. Once we have agreed on a new debian
>     architecture name, the symbols can be made arch specific.
> 
> * After everything was built, I tried installing the packages into
>   a chroot with qemu-debootstrap, which failed because I had
>   configured the glibc to assume it's running on a new kernel
>   while the qemu-user binary I had lacks the new syscalls.
>   I believe this is fixed in upstream qemu, but did not try that.
> 
> * Trying to install again I used a clean debian-arm64 installation
>   running in qemu-system-aarch64, and attempted installing the
>   armhf packages using a regular debootstrap, running the 32-bit
>   binaries in compat mode of a recent arm64 kernel. This partially
>   worked and I could chroot into the system and use a shell, but
>   ultimately the debootstrap did not complete because of errors.
>   I saw that 'tar' had failed because of the stat() glibc ABI mismatch
>   breaking its private gnulib fdutimens() implementation, and this is
>   where I gave up.
> 
> I have spent more time on this now than I had planned, and don't
> expect to do further work on it anytime soon, but I hope my summary
> is useful to others that are going to need this later.  I can
> obviously share my patches and build artifacts if anyone needs them.

Could you upload them to any server? (kernel.org or github)?

> There are two additional approaches that would likely get a Debian
> bootstrap further, but that I have not tried as they were previously
> dismissed:
> 
> * Adding a time64 armhf as a separate (incompatible) target in glibc
>   that defines __TIMESIZE==64 and a 64-bit __time_t would avoid
>   most of the remaining ABI issues and put armhf-time64 in the same
>   category as riscv32 and arc, but this idea was so far rejected by
> the glibc maintainers.

As fair as I know riscv32 and arc will use generic syscall interface.
The arm32 bit doesn't support it - so the code from those two
aforementioned ports will not be used.

>    Depending on how hard this turns out to be,
>   it could be used to get to the point of self-hosting though, and
>   help find time64 related bugs in the rest of the distro.
> 
> * Doing the bootstrap using a musleabihf target instead of gnueabihf
>   would avoid the current issues internal to glibc-y2038, but instead
>   lead to new problems with packages that do not currently work with
>   musl. Adelie Linux has shown that it's already possible to build
>   a useful distro using musl and time64[8], and this would
>   sidestep the question of the target triplet. While it would also
>   help find and fix additional bugs in packages, and make an
>   interesting unoffical Debian target, I don't see it replacing
>   the existing armhf port any time soon.
> 
> For additional information about the Debian plans, see the
> article on LWN[9] that summarizes the discussion started by
> Steve McIntyre [10].
> 
>       Arnd
> 
> [1] https://wiki.debian.org/HelmutGrohne/rebootstrap
> [2] https://github.com/lmajewski/y2038_glibc/tree/y2038_edge
> [3] https://salsa.debian.org/glibc-team/glibc/-/tree/glibc-2.31
> [4] https://github.com/lmajewski/y2038_glibc/commit/2f72ea2b6f6ee
> [5]
> https://sourceware.org/pipermail/libc-alpha/2020-February/111375.html
> [6] https://pastebin.com/fJYV2stF [7]
> https://bugs.llvm.org/show_bug.cgi?id=45138 [8]
> https://wiki.adelielinux.org/wiki/Project:Time64 [9]
> https://lwn.net/Articles/812767/ [10]
> https://lwn.net/ml/debian-devel/20200204131410.GF3043@tack.einval.com/




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

Attachment: pgpjgvAurSMvT.pgp
Description: OpenPGP digital signature


Reply to: