Re: Heartbleed (was ... Re: My fellow (Debian) Linux users ...)

(Reader beware. Length breeds length.)

On Thu, Apr 17, 2014 at 10:57 PM, somebody wrote:

On 4/17/2014 5:40 AM, Curt wrote:

On 2014-04-17, ken <gebser@mousecar.com> wrote:

Steve brings up a very good point, one often overlooked in our zeal for
getting so much FOSS for absolutely no cost. Since we're all given the
source code, we're all in part responsible for it and for improving it.

I don't think the point is very good for the reasons outlined below (by
others).

I wonder how you could say that.

I include myself when I say this, but we are freeloading. If we want our infrastructure to work, we have to start contributing more to the critical parts.

And the degree to which the financial world relies on freebies from freedom lovers, well, if the guys trying to make money on frictionless market exchanges had to write their own, maybe they'd find it a little easier to face the reality about "frictionless".

Having said that, it seems to me that the following just reinforces the argument that we all need to take more part in this stuff.

http://www.datamation.com/open-source/does-heartbleed-disprove-open-source-is-safer-1.html

Robin Seggelmann, the OpenSSL developer who claims responsibility for
Heartbleed, says that both he and a reviewer missed the bug. He concludes that
more reviewers are needed to avoid a repetition of the incident -- that there
were not enough eyes in this case.

More eyes means us. We may not be able to read code, but we can sure file bug reports, and we can run more of our own services on our own servers where we can work out the setups ourselves and watch the results and file bug reports.

And we can learn to code when the developers are too swamped with bug reports to handle them all.

Another conclusion that might be drawn from Seggelmann's account is that
depending on developers to review their own work is not a good idea.

Which is saying that more of us need to get our eyes on the code.

Unless
considerable time passes between the writing of the code and the review, the
developers are probably too close to the code to be likely to observe the flaws
in it.

However, the weakness of Seggelmann's perspective is that the argument is
circular: if Heartbleed was undiscovered, then there must not have been enough
eyes on the code. The proof is in the discovery or the failure to discover,
which is not exactly a useful argument.

Not a useful argument? So?

Discovering an argument circularity in no way disproves a hypothesis. It only shows that more work is necessary, to understand the problem. And the cycle itself is often useful when looking for a way to establish a real foundation to the argument.

A more useful analysis has been offered by Theo de Raadt, the founder of
OpenBSD and OpenSSH...

http://article.gmane.org/gmane.os.openbsd.misc/211963

(I'll quote most of de Raadt's usenet article--hope nobody minds).

In this case it appears that some context is necessary, as well.

So years ago we added exploit mitigations counter measures to libc
malloc and mmap, so that a variety of bugs can be exposed.

Theo is talking about openBSD's libraries, and about libraries from other OSses that have taken similar approaches.

One thing I think I remember openBSD did for this kind of thing was putting free()ed memory into a pool that is sanitized (over-written with zeros) before being returned to the allocatable pool.

(I think some people have tried overwriting with random data, but that's for preventing memory burn-in, and does take enough extra time to be unwarranted for most allocation uses. And is only important when you have reason to worry about the spooks grabbing your hardware and taking it down to the physics lab.)

Another involved randomization of the allocation addresses returned, and finer granularity on the allocations.

Such
memory accesses will cause an immediate crash, or even a core dump,
then the bug can be analyed, and fixed forever.

I think Theo has learned to not dig in too deeply when talking with the press. Which leaves more technically inclined types to either wonder what he's talking about or dig in to find out. (In my case, I was lurking on the miscellaneous list at openbsd during part of the time he is referring to. My comments come from memory; if you are interested, you should probably dig into their archives.)

"Fixed forever" is a bit of an exaggeration, of course. But he means that the collected set of strategies that openBSD uses will often mean that array accesses out of bounds will hit memory that is not flagged as accessible when a program is running on openBSD. And openBSD is pretty aggressive about having the MMU through exceptions on such accesses.

There are granularity issues, depending on page sizes, but openBSD is pretty aggressive there, too.

Which means that, if openssl were using the system malloc() when compiled on openBSD, it would have thrown exceptions, and left more than a few coredumps, and the openBSD developers would have been all over that pretty quickly after the change entered the openssl repositories.

One might wonder whether they would have fed the fix back upstream, and there was already a bit of acrimony between them and the openssl developers at the time, but I'm pretty sure they'd have sent a heads-up over. They don't like to keep complicated patch sets around any more than any of us.

Some other debugging toolkits get them too. To a large extent these
come with almost no performance cost.

"Almost" is a relative term, but because of the work done by the openBSD team and others with similar goals, some of these strategies have been picked up in the more mainstream OSses. (Including debian.) Which is among the reasons the grosser bounds-check bugs are getting fixed.

But around that time OpenSSL adds a wrapper around malloc & free so
that the library will cache memory on it's own, and not free it to the
protective malloc.

You can find the comment in their sources ...

#ifndef OPENSSL_NO_BUF_FREELISTS
/* On some platforms, malloc() performance is bad enough that you can't just

Theo is not perfectly socially acceptable at times, so you need to intuit a bit on the rant that follows. (And I wonder how much was lost between when the reporter was listening and when the report was published. We all know how editing sometimes tends to make a mess of technical stuff.)

OH, because SOME platforms have slow performance, it means even if you
build protective technology into malloc() and free(), it will be
ineffective. On ALL PLATFORMS, because that option is the default,
and Ted's tests show you can't turn it off because they haven't tested
without it in ages.

Ted is one of the developers over at openBSD. What Theo is saying here is that the openssl team let the conditional compile directives get so stale that, even though the openBSD team would prefer to compile it to use the OS malloc() libraries, which are fast enough to satisfy the objectives of the openBSD team, they can't.

(I know about stale pre-compiler directives. I have more than a few of those in code that I've published.)

So then a bug shows up which leaks the content of memory mishandled by
that layer. If the memoory had been properly returned via free, it
would likely have been handed to munmap, and triggered a daemon crash
instead of leaking your keys.

OpenSSL is not developed by a responsible team.

Theo is overstating things here, perhaps. Or he means that the openssl team needs to recognize that they need more help and start getting people to help. And start re-working the code to make it more accessible to people not on their team.

My projects would be subject to the same criticism, except that the total number of downloads doesn't seem to warrant building teams for them. I think what Theo is criticizing here is that the openssl project entered the mainstream more than fifteen years ago, and the developers really should have been recruiting help a long time ago. (Something he's been saying for more than a decade, about and to the openssl project, if I understand correctly.)

(Sorry, a bit long here).

This is a totally irresponsible post, showing the op knows very little about programming.

"Totally irresponsible post" is also a bit of an exaggeration, and it's hard to see who was meant by "op" but surely it couldn't have been Theo.

It doesn't matter if malloc() wrappers were replaced or not.

Depending on what one means by "wrappers", and by "replaced", of course.

The application gets memory from the OS in one or more pages (the exact number is dependent on several parameters). malloc() then subdivides this allocation for application use.

Both the number and size is important here, of course. As is what happens to the memory between calls to free() and calls to malloc(), as I mentioned above.

Now this is key - it really doesn't matter whether the memory has been subdivided or not - if the application has a pointer to the memory, the application can access the memory

It can definitely try, and if the MMU page allocation is not granular enough, it may actually hit real memory. Many OS runtime libraries have deliberately made allocation larger than necessary so that careless handling of pointers won't kill beginning C programmers' desire to use the language.

I was involved in a class in C recently, where the teacher suggested that one didn't need to actually allocate an array for a string of about 6 bytes. The teacher knew his stuff, within the context of what he was teaching. In a practical sense, he was almost right, within that context.

But only within that context. (I'm not sure whether current versions of MSVisualStudio or whatever they call it now still play that game or not, but I know a lot of embedded systems compilers deliberately leave some un-dedicated real RAM accessible at the bottom of the address space.) I feel bad about my reaction, but I wasn't able to let it go, talk to him after the class, and let him correct it in a later session. I hope he doesn't hate me for it.

(this has been a source of MANY bugs in C and C++). This access is direct access to the memory, and does not call malloc().

But there is only physical memory there to be accessed if the allocation pages are large enough.

As an example: the application calls malloc() with a request for 4 bytes of memory. malloc(), seeing there is currently no free space available for the application, sends a request to the OS for 256K of memory (so it has extra for the next request). malloc() then returns a pointer to (very near) the start of that memory, containing room for 4 bytes of application data.

And openBSD, if the processor's MMU is good enough, will actually give much less than 256K. I should go check, I suppose, before I give actual numbers, but my memory is that openBSD will give an actual MMU page of 4K if the processor's MMU will support that.

Which is more than 4 bytes, and might, indeed, contain cyptologically important data if it hasn't been sanitized, but is much less likely to do so than 64K.

But that really isn't relevant, because the openBSD regression tests generally include tests for this kind of stuff. And the use of pre-processor directives in openssl prevented their tests from catching this. I don't know if sanitizing reclaimed memory made it irrelevant on openBSD or not. I haven't had time to go check over the last several months, and I haven't really cared (for reasons I've stated elsewhere).

openBSD contains code to catch and mitigate this kind of error, and the openssl team's small size and its misplaced focus on speed prevented the mitigation code in openBSD from working.

But the application now has a pointer and can directly access any memory in that 256K block. And since no library code is involved, nothing will catch a problem. Only if the program tries to access memory beyond the 256K block will there be a problem; the CPU will detect the application is trying to access an invalid address and notify the OS (which will typically attempt to terminate the application).

And, of course, 256K blocks are a lot more forgiving (in the wrong sense of forgiving) than even 32K blocks.

So the whole premise on which his "not responsible team" is complete crap. He is the irresponsible one here.

Well, Theo may be overly critical, but I don't think so. openssl has been too much in the mainstream to be responsibly maintained by such a small team for too long. They don't need more full-time programmers, but they need to accept more help from people that can help them, and they need to refrain from focusing on speed for this kind of thing.

It would have been "nicer", and perhaps more conducive to cooperation if Theo had chosen a softer way to express this, but "irresponsible" is not incorrect.

We who read this have to consider our own responsibility, as well. If the openssl team is irresponsible, what should we do? Waste time arguing about responsibility? Go over there are offer our help in numbers that might constitute a social DOS? Start competing projects and proclaim to the world all about how we're not going to make THAT mistake?

Competing projects started (relatively) quietly would be useful.

Downloading the source and playing with it a bit to see if we could actually help before we go over and offer our help would also. (Anyone who is interested in this, please refrain from offering until you can find your way around the code without asking too many questions.)

Trying to understand our own personal level of responsibility here is also useful. (Admittedly, arguing is sometimes useful when we are trying to do that, if we argue with an open mind.)

Other than that, learning to code, and learning to help with localization and documentation work is highly recommended. Many other projects need our help, too.

As a side note - I've been programming for about 47 years now (including several years at IBM) and managed many projects, both small and large. One thing I've found - people aren't perfect. There are ALWAYS bugs in the code. Good eyes and good QC measures (including code test suites) will catch a lot of bugs.

Absolutely.

But it doesn't matter how many eyes you have on the code, or how many test suites you run the code through.

I think that you mean that it doesn't matter, in the sense that we must always assume there are some bugs?

Because, of course, it does matter in the sense of reducing the number of bugs, and the probability of serious bugs being used in real-world attacks.

ANY non-trivial program is likely to have bugs (the last figure I heard was around 1 *serious* bug for every 1K LOC in released code). Could this bug have been caught? Definitely. Should this bug have been caught? Maybe - it's fairly subtle, and probably only someone who has suffered from buffer overruns in the past would catch it.

It should not be considered subtle. It is easy to fail to think about, because, in the social context, slow code seems to be embarrassing, and large, persistent buffers are one way to speed up code.

But integer indexes and pointers are not so different that this kind of bug should be considered subtle.

It's unfortunate that this happened, and definitely not good. But it does not indicate any developers were irresponsible.

Well, the openssl team was not as irresponsible here as, say, Microsoft was for pushing MSIE and OutlookExpress into the MSWindows95 product package. (To say nothing of ActiveX.) Or as irresponsible as Intel has been for failing to build CPUs with proper memory management, putting the marketing value of speed (and the desire to own the market) far too far ahead of safety. And so on.

And we who have been willing to avoid the risk of being fired for not buying whatever is the assumed defacto standard, we have been irresponsible too.

I think Theo has the right to point fingers here. He and his team have been working hard to make the industry a safer place.

We, on the other hand, should probably "shut up and code", so to speak.

--
Joel Rees

Be careful where you see conspiracy.
Look first in your own heart.