Re: Perl symbol problem - release critical (Re: Bug#489132)

On Thu, Jul 03, 2008 at 08:11:05PM +0100, Ian Jackson wrote:
> Raphael Hertzog writes ("Bug#489132: lenny release notes, upgrade dpkg first"):
> > To work-around a problem that can happen in the perl 5.10 upgrade (see
> > #479711), the perl scripts contained in dpkg (update-alternatives,
> > dpkg-divert) have been modified... but for the work-around to be used, the
> > new dpkg must obviously be installed first, before the dist-upgrade.
> I don't think this is the right solution.  To be honest I'm just
> astonished at this situation, which is terrible.  It is the
> consequence of a mistake in the Debian Perl policy - a mistake which
> has caused trouble on every previous upgrade, too.

Revisiting this; #489132 and #488300 are still open and I'm (hopefully)
less confused about the issue now than in my earlier reply to #489132
and others.

Summary: I think making perl-base Pre-Depend on dpkg (>= 1.4.20) is enough
to fix this for lenny. 

> Possible solutions that I see for lenny:

> 2. Find out which modules are used in this way by Essential packages.
>    Arrange somehow for those modules to fail at `require' when loaded
>    with Perl 5.8 from etch.  This might involve rebuilding only
>    those modules.

The only perl scripts provided by Essential packages are


All but chkdupexe are in the dpkg package. No external modules are used
by scriptreplay, chkdupexe, and mksplit. The only module outside
perl-base that is used by the others is Locale::gettext. 

All but cleanup-info set PERL_DL_NONLAZY in their Lenny versions, which
makes the "eval 'use Locale::gettext'" call fail due to missing symbols
when liblocale-gettext-perl and perl-base are out of sync.

The Lenny version of liblocale-gettext-perl Pre-Depends on 
perl-base (>= 5.10.0-9). This makes it impossible for the Etch version
of /usr/bin/perl to see the Lenny version of Locale::gettext.

The other way around is still possible: unpacking perl-base on an Etch
system (after upgrading libc6 etc.) makes Perl 5.10 see the 5.8 version of
Locale::gettext. This breaks the Etch version of the dpkg utilities. 

The breakage could be prevented by making perl-base Pre-Depend on dpkg
(>= 1.14.20). I think this would be enough to solve the issue for lenny
and fix #488300 (and possibly #489132, but that one includes some concerns
about the need to upgrade apt manually first.)

If /usr/sbin/cleanup-info is considered part of the Essential
functionality of dpkg, it also needs to set PERL_DL_NONLAZY.
Judging by /usr/share/doc/dpkg/README.feature-removal-schedule,
that is probably not the case.

> * Suppressing lazy symbol resolution may work in this case, but it is
>   not correct.  ABI changes may result in random crashes due to
>   different structure sizes and do not necessarily involve missing
>   symbols - so the problem may go undetected.  If we think that we
>   want to fix it in etch->lenny by suppressing lazy symbol resolution,
>   we need to:
>     (a) check what the actual ABI differences are and that either
>         there aren't any others besides missing symbols, or that
>         every module will definitely fail to load

I think it's clear that Locale::gettext fails to load both ways
when PERL_DL_NONLAZY is set:

- when compiled for 5.10.0 it needs Perl_Istack_sp_ptr from perl/libperl,
  which is not present in 5.8.8
- when compiled for 5.8.8  it needs Perl_Tstack_sp_ptr instead,
  which is not present in 5.10.0

>     (b) regard this as a workaround and do something sensible next
>         time.

Post-lenny, I see two options that don't involve changing the module path:

- mandate that ABI changes in the Perl XS module interface
  will always be accompanied with a symbol rename caught by
  PERL_DL_NONLAZY, and artificially do that for Debian if needed in the
  future. This practically means "just carry on and hope we don't have
  to deviate from perl upstream".

- integrate Locale::gettext in perl-base (#479681) and mandate that
  Essential:yes programs may not load non-Essential XS modules even
  opportunistically (inside an eval block) because PERL_DL_NONLAZY
  can't be trusted.  This seems to be the safer option of the two.

Niko Tyni   ntyni@debian.org

