Re: Packaging of static libraries

To: Ian Jackson <ijackson@chiark.greenend.org.uk>
Cc: debian-devel@lists.debian.org
Subject: Re: Packaging of static libraries
From: Alexander Cherepanov <ch3root@openwall.com>
Date: Tue, 19 Apr 2016 00:37:09 +0300
Message-id: <[🔎] 57155385.3050104@openwall.com>
In-reply-to: <[🔎] 22286.15249.300643.29659@chiark.greenend.org.uk>
References: <[🔎] 20160410070536.GC31278@an3as.eu> <[🔎] 570A54E2.8000907@sceal.ie> <[🔎] 20160412113624.GA1298@cventin.lip.ens-lyon.fr> <[🔎] 22284.64929.219683.146216@chiark.greenend.org.uk> <[🔎] 20160412154808.GA17103@angband.pl> <[🔎] 22286.15249.300643.29659@chiark.greenend.org.uk>

On 2016-04-13 15:29, Ian Jackson wrote:

Adam Borowski writes ("Re: Packaging of static libraries"):

On Tue, Apr 12, 2016 at 02:52:33PM +0100, Ian Jackson wrote:

I'm afraid that LTO is probably too dangerous to be used as a
substitute for static linking.  See my comments in the recent LTO
thread here, where I referred to the problem of undefined behaviour,
and pointed at John Regehr's blog.


LTO is no different from just concatenating all source files and making
functions static.  If your code blows after this, it is your fault not
LTO's.  LTO just allows interprocedural optimizations to work between
functions that were originally in different source files.


This narrative of `fault' has two very serious problems.


Firstly, it is hopelessly impractical.  As I have already observed
here:

    Recently we have seen spectacular advances in compiler optimisation.
    Spectacular in that large swathes of existing previously-working code
    have been discovered, by diligent compilers, to be contrary to the
    published C standard, and `optimised' into non-working machine code.

    In fact, it turns out that there is practically no existing C code
    which is correct according to said standards (including C compilers
    themselves).

There is practically no existing code in any language which is correcteven if you exclude problems with standards. Not sure we can draw manyuseful conclusions from such general statements.

To get something more specific, the paper [1] claims that their toolSTACK detected UB in 40% of wheezy packages with C/C++ code.


[1] https://pdos.csail.mit.edu/papers/stack:sosp13.pdf
[2] https://css.csail.mit.edu/stack/

Real existing code does not conform to the rules now being enforced by
compilers.  Indeed often it can be very hard to write new code which
does conform to the rules, even if you know what the rules are and
take great care.

I have an impression that many complaints about problems with UB stemfrom attempts to write some tricky code. Sometimes tricky (or outrightnon-conforming) code is required, e.g., to work around limits of legacyAPI. But in many cases it's just clever code trying to get a bit morespeed or to save a bit of memory. Clever enough to get into the areawhere some advanced rules apply but not clever enough to obey these rules.

Arguing for safety over speed is somewhat strange then. Why write thetricky code in the first place?

Two examples showing how C has been turned into a puzzle language:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_event.c;h=02b39e6da8c65c033c99a22db4784de8d7aeeb7a;hb=HEAD#l458
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_internal.h;h=005fe538c6b5529447185797cc23d898c219e897;hb=HEAD#l294

Why not separate the free list from active watch_slots? Why not have anarray of flags indicating which slot is which?

If those approaches are deemed unattractive, explicitly stating anassumption of flat memory by casting to uintptr_t before the comparisondoesn't seem very laborious.

http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03340.html
http://lists.xenproject.org/archives/html/xen-devel/2015-11/threads.html#00112


Yeah, there is a bunch of misconceptions there.

1. Type-punning via unions is time-honored tradition described in allversions of the C standard. The referenced email even links to DR 283 soit's not clear to me why the confusion.

2. The compiler is not free to assume that padding will not be read. Itcould be read as chars (even if you ignore type-punning). You mentionedit yourself in other emails. Not that it gives you much.

3. While writing to / reading from dst->p0 you have to consider not onlythe type of p0 but the type of dst too. This is a very practicalconcern. For example, seehttps://twitter.com/johnregehr/status/706868554222723073 .

4. uint8_t is not guaranteed to be one of the character types and,hence, is not free to alias everything. See, e.g.,https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 . Not animmediate concern but something to keep in mind if you strive for strictstandard conformance.

I'm not familiar with Xen but why overlay data for 32- and 64-bit casesinstead of having different structs for them? Why use macros instead offunctions?

The second problem is that it is based on the idea that the C
specification is by definition right and proper.

Whether the C standard is right and proper or not, it's the only(somewhat) widely accepted middle ground for now.

There are two ways to evaluate the the C specification's rightness and
properness.

The first is to ask what the the nominal remit of the C standards
bodies is.  Well, it is and was to standardise existing practice.
Existing practice was to use C as a kind of portable assembler; the
programmer was traditionally entitled to do the kind of things which
are nowadays forbidden.  So the C committee has failed at its
task. [1]

The task of the committee was to balance several principles. Why many(especially in Free Software world) consider being a high-levelassembler a much more important principle than other ones is not clearto me.

The second is to ask what is most useful.  And there again the C
committee have clearly failed.


Apparently others disagree.

We in Debian are in a good position to defend our users from the
fallout from this problem.  We could change our default compiler
options to favour safety, and provide more traditional semantics.

Debian (and other distros) have somewhat unusual stakes in the UB debatedue to the porting needs. A lone developer can choose to support onlyone platform and is free then to complain that C doesn't provide fullfreedom of assembler for this platform. But Debian often takes suchprograms and build them for many other architectures.

As an example consider shifts by a value greater than or equal to thewidth of the left operand. They are UB in C and work differently ondifferent CPUs. Will it benefit Debian to declare themimplementation-defined in C? Probably not. Another example is unalignedaccesses.

It looks like Debian (and Free Software community in general) shouldstrongly favor portability of the standard C over the ability to serveas a high-level assembler.

We would have influence upstream (for example to further advance the
set of available safety options) if we cared to use it.  But sadly it
seems that the notion that our most basic and widely-used programming
language should be one that's fit for programming in is not yet fully
accepted.

At the very least we should fiercely resist any further broadening of
the scope of the C UB problem.

Then the first thing to do is to stop upgrading gcc. Doesn't seem like avery practical approach.

Next thing is to add options like -fwrapv or -fno-strict-overflow,-fno-delete-null-pointer-checks, -fno-strict-aliasing but is there achance for consensus about it? Doubtful, but who knows...

Perhaps less controversial is fixing UB (and other bugs) in the existingcode. Several years ago this was hopeless but recently some toolsemerged that allow to tackle the problem. First of all, sanitizers --ASan, UBSan, MSan, TSan, ... While running everything in valgrind is notvery convenient, building everything with ASAN seems quite feasible.Recent activity related to Debian:


http://balintreczey.hu/blog/progress-report-on-hardened1-linux-amd64-a-potential-debian-port-with-pie-asan-ubsan-and-more/
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812782
https://github.com/Mohit7/Debian-ASan

The last project contains a list of several hundreds packages that failto build or run with ASan. Unlike UBSan problems which may or may notlead to a bug in an executable now or in future, ASan problems are quitereal right now.

After the problems found with ASan and UBSan are dealt with, other toolscould be used to find further problems:


- STACK (mentioned above);

- tis-interpreter -- https://github.com/TrustInSoft/tis-interpreter -- arecently released "interpreter for finding subtle bugs in programswritten in standard C";

- libcrunch -- https://github.com/stephenrkell/libcrunch -- a tool "forfast dynamic type checking".


The tools are there, is there will to fix things?..

Perhaps some mixed approach is possible. E.g., disable someoptimizations by default and reeenable them when tests with ASan etc.pass. Or vice versa -- disable some optimization when tests fail to passwith ASan enabled.


--
Alexander Cherepanov

Reply to:

References:
- Packaging of static libraries
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Packaging of static libraries
  - From: Alastair McKinstry <alastair.mckinstry@sceal.ie>
- Re: Packaging of static libraries
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: Packaging of static libraries
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>
- Re: Packaging of static libraries
  - From: Adam Borowski <kilobyte@angband.pl>
- Re: Packaging of static libraries
  - From: Ian Jackson <ijackson@chiark.greenend.org.uk>

Prev by Date: Bug#821461: ITP: pysword -- A native Python reader for SWORD Project Bible Modules
Next by Date: Re: Packaging of static libraries
Previous by thread: Re: Packaging of static libraries
Next by thread: Re: Packaging of static libraries
Index(es):
- Date
- Thread