[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packaging of static libraries



On 2016-04-13 15:29, Ian Jackson wrote:
Adam Borowski writes ("Re: Packaging of static libraries"):
On Tue, Apr 12, 2016 at 02:52:33PM +0100, Ian Jackson wrote:
I'm afraid that LTO is probably too dangerous to be used as a
substitute for static linking.  See my comments in the recent LTO
thread here, where I referred to the problem of undefined behaviour,
and pointed at John Regehr's blog.

LTO is no different from just concatenating all source files and making
functions static.  If your code blows after this, it is your fault not
LTO's.  LTO just allows interprocedural optimizations to work between
functions that were originally in different source files.

This narrative of `fault' has two very serious problems.


Firstly, it is hopelessly impractical.  As I have already observed
here:

    Recently we have seen spectacular advances in compiler optimisation.
    Spectacular in that large swathes of existing previously-working code
    have been discovered, by diligent compilers, to be contrary to the
    published C standard, and `optimised' into non-working machine code.

    In fact, it turns out that there is practically no existing C code
    which is correct according to said standards (including C compilers
    themselves).

There is practically no existing code in any language which is correct even if you exclude problems with standards. Not sure we can draw many useful conclusions from such general statements.

To get something more specific, the paper [1] claims that their tool STACK detected UB in 40% of wheezy packages with C/C++ code.

[1] https://pdos.csail.mit.edu/papers/stack:sosp13.pdf
[2] https://css.csail.mit.edu/stack/

Real existing code does not conform to the rules now being enforced by
compilers.  Indeed often it can be very hard to write new code which
does conform to the rules, even if you know what the rules are and
take great care.

I have an impression that many complaints about problems with UB stem from attempts to write some tricky code. Sometimes tricky (or outright non-conforming) code is required, e.g., to work around limits of legacy API. But in many cases it's just clever code trying to get a bit more speed or to save a bit of memory. Clever enough to get into the area where some advanced rules apply but not clever enough to obey these rules.

Arguing for safety over speed is somewhat strange then. Why write the tricky code in the first place?

Two examples showing how C has been turned into a puzzle language:

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_event.c;h=02b39e6da8c65c033c99a22db4784de8d7aeeb7a;hb=HEAD#l458
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_internal.h;h=005fe538c6b5529447185797cc23d898c219e897;hb=HEAD#l294

Why not separate the free list from active watch_slots? Why not have an array of flags indicating which slot is which?

If those approaches are deemed unattractive, explicitly stating an assumption of flat memory by casting to uintptr_t before the comparison doesn't seem very laborious.

http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03340.html
http://lists.xenproject.org/archives/html/xen-devel/2015-11/threads.html#00112

Yeah, there is a bunch of misconceptions there.

1. Type-punning via unions is time-honored tradition described in all versions of the C standard. The referenced email even links to DR 283 so it's not clear to me why the confusion.

2. The compiler is not free to assume that padding will not be read. It could be read as chars (even if you ignore type-punning). You mentioned it yourself in other emails. Not that it gives you much.

3. While writing to / reading from dst->p0 you have to consider not only the type of p0 but the type of dst too. This is a very practical concern. For example, see https://twitter.com/johnregehr/status/706868554222723073 .

4. uint8_t is not guaranteed to be one of the character types and, hence, is not free to alias everything. See, e.g., https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 . Not an immediate concern but something to keep in mind if you strive for strict standard conformance.

I'm not familiar with Xen but why overlay data for 32- and 64-bit cases instead of having different structs for them? Why use macros instead of functions?

The second problem is that it is based on the idea that the C
specification is by definition right and proper.

Whether the C standard is right and proper or not, it's the only (somewhat) widely accepted middle ground for now.

There are two ways to evaluate the the C specification's rightness and
properness.

The first is to ask what the the nominal remit of the C standards
bodies is.  Well, it is and was to standardise existing practice.
Existing practice was to use C as a kind of portable assembler; the
programmer was traditionally entitled to do the kind of things which
are nowadays forbidden.  So the C committee has failed at its
task. [1]

The task of the committee was to balance several principles. Why many (especially in Free Software world) consider being a high-level assembler a much more important principle than other ones is not clear to me.

The second is to ask what is most useful.  And there again the C
committee have clearly failed.

Apparently others disagree.

We in Debian are in a good position to defend our users from the
fallout from this problem.  We could change our default compiler
options to favour safety, and provide more traditional semantics.

Debian (and other distros) have somewhat unusual stakes in the UB debate due to the porting needs. A lone developer can choose to support only one platform and is free then to complain that C doesn't provide full freedom of assembler for this platform. But Debian often takes such programs and build them for many other architectures.

As an example consider shifts by a value greater than or equal to the width of the left operand. They are UB in C and work differently on different CPUs. Will it benefit Debian to declare them implementation-defined in C? Probably not. Another example is unaligned accesses.

It looks like Debian (and Free Software community in general) should strongly favor portability of the standard C over the ability to serve as a high-level assembler.

We would have influence upstream (for example to further advance the
set of available safety options) if we cared to use it.  But sadly it
seems that the notion that our most basic and widely-used programming
language should be one that's fit for programming in is not yet fully
accepted.

At the very least we should fiercely resist any further broadening of
the scope of the C UB problem.

Then the first thing to do is to stop upgrading gcc. Doesn't seem like a very practical approach.

Next thing is to add options like -fwrapv or -fno-strict-overflow, -fno-delete-null-pointer-checks, -fno-strict-aliasing but is there a chance for consensus about it? Doubtful, but who knows...

Perhaps less controversial is fixing UB (and other bugs) in the existing code. Several years ago this was hopeless but recently some tools emerged that allow to tackle the problem. First of all, sanitizers -- ASan, UBSan, MSan, TSan, ... While running everything in valgrind is not very convenient, building everything with ASAN seems quite feasible. Recent activity related to Debian:

http://balintreczey.hu/blog/progress-report-on-hardened1-linux-amd64-a-potential-debian-port-with-pie-asan-ubsan-and-more/
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812782
https://github.com/Mohit7/Debian-ASan

The last project contains a list of several hundreds packages that fail to build or run with ASan. Unlike UBSan problems which may or may not lead to a bug in an executable now or in future, ASan problems are quite real right now.

After the problems found with ASan and UBSan are dealt with, other tools could be used to find further problems:

- STACK (mentioned above);

- tis-interpreter -- https://github.com/TrustInSoft/tis-interpreter -- a recently released "interpreter for finding subtle bugs in programs written in standard C";

- libcrunch -- https://github.com/stephenrkell/libcrunch -- a tool "for fast dynamic type checking".

The tools are there, is there will to fix things?..

Perhaps some mixed approach is possible. E.g., disable some optimizations by default and reeenable them when tests with ASan etc. pass. Or vice versa -- disable some optimization when tests fail to pass with ASan enabled.

--
Alexander Cherepanov


Reply to: