[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

OT: Type safety (was: Language War (Re: "C" Manual))



Lo, on Monday, December 31, Erik Steffl did write:

> Richard Cobbe wrote:
> > 
> > Lo, on Sunday, December 30, Dimitri Maziuk did write:
> > 
> > > * William T Wilson (fluffy@snurgle.org) spake thusly:
> > > ...
> > > > So... why *should* the programmer concern himself with individual
> > > > bytes of memory? (Assuming he is writing an ordinary application and
> > > > not a hardware driver or something similar).
> > >
> > > Because if he does not, his application will segfault and dump core.
> > 
> > No.  This level of concern is necessary only for non-type-safe
> > languages.  It is provably impossible for a program written in a
> > type-safe language to segfault (assuming that the language's run-time
> > system is implemented correctly, of course).
> 
>    ???
> 
> it's the resource allocation that's important, not types. garbage
> collectors are generally more robust as far as segfaulting (and
> similar errors) go.

No, type-safety is important.  Type-safety makes several guarantees, but
the most important for our purposes is the following:

    If an expression E has (static) type T, then the result of
    evaluating E is *always* one of two things:

        * the evaluation throws an exception which escapes beyond the
          boundaries of E (or otherwise signals an error condition)

        * E's value is a valid object of type T and not some random
          collection of bits.

In particular, this means that if p is of type pointer to T, then
dereferencing p will *always* give you a valid T.

Languages generally do this by restricting the set of pointer values
(and such restricted pointers are usually called `references' [1]).  For
instance, in LISP and Scheme, there is exactly one way [2] to create a
reference to a list: call the function CONS, and the return value is
your reference.  The language guarantees that, if CONS returns (as
opposed to throwing an exception), its result is a reference to a valid
list.

This is in sharp distinction to C and C++.  The C++/STL equivalent to
CONS is, more or less, list<T>'s constructor, and that's certainly a way
to get a reference:

    list<int> *l = new list<int>;   // safe: *l is a valid list<int>

Unfortunately, it's not the only way to get such a reference:

    int i;      // initialize i somehow---or, better yet, don't!

    l = reinterpret_cast<list<int> *>(i);

Is *l a valid list<int>?  You can tell only if you have very detailed
knowledge of where things are laid out in memory.  Chances are, though,
that l is bogus and dereferencing it will segfault.

> (of course, just because the program doesn't segfault it doen't mean
> it's working correctly).

Quite true.  The point I was making, though, is that a type-safe
language does not require you to keep track of the bit-and-byte layout
in order to ensure program correctness.

> the other important factor is how much runtime check language does
> (e.g. checking for array boundaries etc.)

Yes.  This is part of type safety.  Consider the following, where T is
some class type with a non-trivial default constructor:

    T a[42];

    f(a[56]);

This compiles, but it's not correct.  Since a is defined as an array of
type T, then the expression a[56] has type T.  However, the result of
this expression is *not* an object created by one of T's constructors.
If you do this in C or C++, the results are conveniently undefined.  In
a type-safe language like Java, this causes the run-time system to
throw a well-defined exception.

Now, the typical Java program doesn't catch the
array-index-out-of-bounds-exception (sorry, can't remember the real name
right off the top of my head), so evaluating this expression would cause
the program to terminate with an uncaught exception.  `So what?' you
ask.  `If both the C++ and the Java programs crash, what difference does
it make?'

Big difference---the C++ program may not crash.  It may, instead,
misinterpret the random collection of bits 56*sizeof(T) bytes past the
start of a as a T and carry on with the computation.  This will very
likely produce incorrect output---with absolutely NO indication that the
output is incorrect.  I'd rather have the program tell me that it's
broken, thanks.

> and as far as runtime system goes - only interpreted languages have
> runtime systems.

Not true.  There exist compilers for Lisp and Scheme that produce native
machine code, but the generated program requires a great deal of
run-time support, like the GC.  (This may simply be included directly in
the executables produced, or it may be distributed as a shared library.)

Alternatively, C++ has a runtime system to handle things like RTTI and
exception handling.  Even C has a very small run-time system to handle a
few simple things: program startup and termination, and all the stack
manipulations that take place during function calls.

(I include in the runtime system all that code which must execute at run
time to implement language features but that isn't part of the
language's standard library.)

=====

1  There's an unfortunate collision of terms here; C++'s `references'
   are generally more restrictive than references in, say, Scheme or
   Java, and they don't provide type-safety.  Don't let the names
   confuse you.

2  I'm ignoring other functions like LIST, because they're easily
   understood in terms of CONS.

Richard



Reply to: