Re: ah, here it is
[Transcribers Notes:
This is a transcript of a technetcast real audio broadcast made
by Thomas Bushnell on 2000-08-07. You can get it from
http://www.technetcast.com/tnc_play_stream.html?stream_id=381
Philip Lourier speaks without indentation,
Thomas Bushnell does not.
Some non-speach is annotated [].
I also wish Thomas Bushnell good luck in California.
Thanks,
Paul Emsley.]
Welcome codebytes on Dr Dobbs signetcast, I'm Philip Lourier. [snip]
Today's program is about the GNU Hurd, not about a large group
of foul smelling thick skinned animals but the GNU project's Free
Software replacement for the Unix kernel. The project was
started over 10 years ago and was one of the last pieces missing
in building an entirely GPLed-based Unix package.
>From the start, Hurd was designed as a set of servers to work on
top of a micro-kernel, in this case, Carnegie Mellon's Mach
micro-kernel.
We will talk today about this overall design and overall
architecture of Hurd and we'll also get an update on the status
of the project.
And since it has started, of course, we've seen Linux and 386BSD
come on the scene.
To talk about all this we are joined by Thomas Bushnell, the
architect and designer of the Hurd. Thomas was with the Free
Software Foundation for many years, his currently at MIT,
actually for a few more weeks, I believe, but he's still the main
man behind the project - Thomas welcome to the program.
Hello.
You're at MIT right now? How long were you at the Free Software
Foundation for?
About 8 years.
You actually started the project, is that correct?
I started the Hurd project.
The idea was to build a totally Free Software Unix kernel from the
start?
That's right.
What were some of the technical objectives that you set for the
project at its inception?
Once we had decided to use Mach, that made it possible to do
several things that couldn't be done on traditional sorts of
Unix kernels. Some of the most important were to let ordinary
users that don't have any privilege on a system create
file-system implementations and perhaps network implementations,
and things like that, in a way that wouldn't interfere with other
users.
Let's start even before that. Why Mach and why a micro-kernel?
We were looking for a free kernel at the time. There were
technical reasons why it would be good to use Mach and there
were reasons at the time why that was what we chose that don't
necessarily agree with the current reasons.
At the time it was one of the only free kernels available of any
kind.
It wasn't GPLed though, was it?
Not it was distributed (it still is) under a license very much
like the BSD license and the X Consortium License.
And that license was good enough for you?
That's right. It's definitely a Free Software license.
How about the technical aspects of Mach? What made it well adapted to
what you were trying to do?
I liked it because it did have a full set of features as a
micro kernel. It had a sophisticated message passing system and
a sophisticated virtual memory system. It didn't have any major
gaps in that area and other micro kernel projects tended to be
simple one-off concepts and not complete micro kernels in sense
that I [needed].
Was there any talk at the time of using a monolithic kernel?
There was. Our thought at the time was to write a replacement
for the BSD kernel but, ironically, it seemed at the time that
it would be faster to go with the existing Mach kernel and write
the Hurd on top of it.
Can you very quickly give us an overview of what is a micro kernel as
opposed to a monolithic kernel?
A monolithic kernel is what people are familiar with in a
Linux kernel or a BSD kernel. It's one big program that's
loaded at boot time and it does everything from manage processes
to manage file systems and disk allocations, to network
protocols.
A micro-kernel instead sees as it's job, providing the way how
processes that communicate with each other, but not providing
the kinds of abstractions like file systems and network
protocols or even necessarily, virtual memory policies.
Now, what's interesting is at the time, 10 years ago, micro-kernels
were really the rage and actually there was a famous conversation
between Linux Torvalds and the creator of Minix, Andy Tanenbaum about
the advantages of each and I think that there was a thread on
comp.os.minix entitled "Monolithic kernels are obsolete" or "Linux
is obsolete". At the time it seemed that micro-kernels would really
take over. Is that correct?
I remember that thread and it's very amusing in retrospect.
Minix was never a serious micro-kernel in the sense that I use
the word. It was designed only to run the Minix servers, it
wasn't general. It was as if you had a Unix kernel that could
only run one kind of shell. Part of the flexibility of Unix is
that you can have many different command interpreters and you
can choose the one you like. And in a real micro-kernel like
Mach (or some of the others still being developed) you can do
that. Minix never had that kind of functionality.
From the user's perspective, Minix was a monolithic kernel, it's
just written a little differently.
What is the status of Mach right now? That project has been over for
several years, right?
Rich Rachid at MIT joined the Dark Side and has moved to the
Pacific Northwest.
He's at Microsoft Research.
That's Right.
The Mach Project at CMU closed down. For many years it was
carried on by the University of Utah and their research has
moved in different directions. We still do maintain a GNU Mach
version. We are now the principle maintainers of Mach. There
are 2 versions of Mach on which you can run the Hurd: GNU Mach
and another one that Roland McGrath has put together that uses
the device drivers from the Utah OS-Kit.
Do you have to go back to Mach and make updates to support new
features in the Hurd?
Not very often. Most of the updates that we to to Mach is to
support newer device drivers.
What kind of services does the micro-kernel provide to your layer?
It provides the notion of a task an multiple threads within a
task, which together work like a Unix process. It provides a
very simple hardware abstraction, the ability to read or write
to sectors of a disk or to send and receive Internet packets, or
ethernet packets, that kind of thing.
How shall we call these different parts? Mach is the micro-kernel.
The Hurd is what?
The Hurd could be called a multi-server. That's the terminology
that the Mach people used to use. They distinguished a single
server, which would be something like taking a Unix kernel and
turning it into something that could run on top of Mach and a
multi-server which would be something like the Hurd, where the
functions of a Unix kernel have been split up into many different
processes.
And how does the Hurd expose its functionality in turn, to the upper
layers, the ones that sit on top of it?
It exposes a message-passing interface. You can send messages
using Mach system calls - they are a kind of interprocess
communication to the file systems. And then the GNU C Library
has Hurd features that allow a programmer to use ordinary POSIX
calls. And the GNU C Library would implement those, instead of
by making a special system call (the way it does in Linux) it
would send a message to a Hurd server.
So conceivably a lot of applications could be easily ported?
Yes. It's a POSIX compatible system.
What kind of features does this kind of architecture provide at the
application level that makes it really desirable?
Users can, for example, write file-system
implementations. They can write programs that implement
directory structures, that another program could then cd into
and ls and do whatever - without needing to know that this is
something special. An example of this is a transparent FTP
server. The ftp program is not the normal ftp program - it's a
special Hurd one that makes it look like a file-system. But you
run it as yourself. You don't need to be root or anything like
that to run it or to mount it. Then you can "cd" into this
directory and you are literally at the remote ftp archive and
all your normal file commands work - you don't need a special
ftp client and a user doesn't need to learn a new language to
know how to use FTP.
So it seamlessly provides network services?
You can turn network services into file system accesses - and users
know how to use file systems. They know how to "ls", they know
how to read files and that kind of thing.
Does it provide all the services that a regular Unix kernel would?
Yes.
So it goes above and beyond the services that Unix kernel would
provide?
In Linux, for example, there are many different file system
implementations. There's and EXT2 file system and there's a FAT
file system and a Mac HFS file system and so forth.
But if your kernel doesn't have built into it or loaded as a
module the file system that you want to access as a user, then
you have to be root to load the module or reboot to a new
kernel.
In the Hurd you could run an unprivileged program that would
provide you access to a different kind of file system type and it
would just work for you.
Could you swap those modules in and out?
Yes. They run as ordinary processes.
What are some of the other modules that can be swapped in and out?
In addition to file systems, there is the network. Right now the
network is one single server that all network processes use, so
it doesn't have the same kind of flexibility that we have for
file systems. But we have an architecture that we would like
to make work that would make that possible.
Other servers that exist in the Hurd that don't do file
systems themselves are for example the server that keeps track
of authentications that is responsible for knowing what UID each
process has.
So all this stuff that system programmers are used to seeing built
into the kernel is exposed differently - it's modular in this kind of
architecture?
That's right.
Even for things that are still running as the superuser, like
that authentication server that knows what user-id every process
is, it's not part of the kernel and a developer can use an
ordinary debugger on the process and doesn't need to worry about
the special considerations of kernel debugging.
So, there's an ease of development and maintainability that also
happens.
Part of the advantage of stuffing everything in a monolithic kernel is
performance because there is no message passing. Is performance an issue
with the Hurd's architecture?
Absolutely.
You mentioned how Andy Tanenbaum had predicted the demise of
monolithic kernels and that of course hasn't happened. One of
the reasons was the perceived sense that the micro-kernel
architectures in use were considerably slower. At the time, I
believe, part of the reason for that is that people has simply
taken a monolithic Unix kernel and split it up into 2 pieces
which were just as monolithic as before. One was like Mach, or
a micro-kernel and the other was a single server that ran on top
of it. So they weren't getting any additional features and all
they saw was that the system was a little bit slower and they
said that it was a waste of time.
The Hurd, by contrast, is one of the only real multi-servers that
has ever been run on top of Mach. CMU[?] had one but it didn't
have enough features. And we find that the performance, with
one exception, is not very bad. Modern hardware, in fact, makes
it completely unnoticeable.
What's the exception?
Currently, the time to do a fork is longer than it should be.
We have a pretty good idea why and a change to Mach that will
make a fork call happen quicker.
How about the I/O Performance?
I/O performance to disks is not bad at all. Network I/O performance
is a little bad right now but that's because we are using the
Linux network stack which is itself a little inefficient,
especially in the way that we use it. Making that work well is
simply an unfinished task that we have.
Where does the TCP/IP stack figure in that model? I guess it's in
user-space, it's not a kernel feature.
That's right.
The TCP/IP stack in our system is a special server called the
pfinet server (PF is Protocol Family) and it's responsible for
managing all of the IP protocol sockets.
And that sits on top of what?
That's an ordinary user process that runs as root and has access
to the ethernet device.
And the ethernet device is what's running in the micro-kernel?
That's inside the micro-kernel at the moment. There have been
ideas about taking even the device drivers and putting them in
user-space in Mach. There has been some work on that at CMU with
disk, but we aren't using that right now.
What about the graphics subsystem? How does that work?
There is really nothing special about that. Unix kernels don't
do very much with graphics at all. They typically just provide
an X-server, direct access to the video hardware. People that
have run X on the Hurd have done just that - there's not very
much interesting kernel work with graphics at the moment.
X is not available for the Hurd at the moment. Why is that?
There are no significant technical difficulty and I am not exactly
up-to-date on the current status of it. People have run X on it
without serious problems. It's just not our top priority.
Like I said, there aren't really any interesting kernel issues,
it just a matter of getting Mach to export access to the VGA
hardware to a process. That works fine and people have done it -
I just don't know if the current Debian GNU/Hurd packages
provide it automatically.
One of the advantages of such a system is that it is modular. But
documentation here is important. So that people can write
services and the interface, the protocols are important. What
is the state of the documentation? Are there standards? Known
protocols? Is it something that you had to develop.
The protocols for communication between the various processes
that make up the Hurd are protocols that we designed. They are
probably the most important part of the design - they structure
the whole system.
How do they work? I guess that there is no memory sharing, for example
(in order to create a protected system).
Mach provides a message-passing system directly. It is one of the
fundamental features that the kernel provides. So, it provides
a direct notion of an interprocess communication port and the
ability to send messages on it securely. So that's what we use.
We have interface definition files to specify what kinds of RPCs
are built using that system.
So people familiar with RPCs would be familiar with this?
They would be familiar with the concept. It's at a lower level
than they usually see it but the ideas are exactly the same.
I guess one fair question is "Why is it taking so long?"
[Laughs] It's a big project and it's bigger than we expected.
I've never been one who can predict how long a project will take.
Have some of the objectives changed since it started? So many things
have happened since.
I'm not sure that any of the core objectives have changed, but
the importance of multi-user computers has decreased a lot in
the last decade. We used to have a lot of computers that would
have many users at once. Now they are less and less common. But
it's still a real factor.
Distributed computing had become more important and that is
something that we've always wanted to do and have designed some
of the architecture to make possible, but we don't have any
support for it yet.
I looked at some of the design papers and indeed there is mention
of distributed features. Can you talk a bit about this?
The Mach interprocess communication system was carefully designed
to be network transparent. So with a little bit of work, and a
special program called a network message server, Mach programs
can communicate over a network with each other exactly the same
was as they communicate on the local machine. They wouldn't
know that they are talking to a server on another computer.
So because the fundamental message passing structure is network
transparent, the possibility has opened up to making the Hurd
network transparent too. Almost all of of the protocols right
now would work like that, a few of the have some problems that
we will be solving once we make the next release and then we'll
be able to start the sub-structural work on having a distributed
Hurd system.
[Breaks for trailers of upcoming shows including an XML discussion
with the authors of Jabber. They play a sound-clip, seemingly of cows
and cow-bells, I imagine that there is supposed to be some sort of GNU
connotation]
Let's play that again, I like that.
[They play it again]
So the project has been 10 years in the making now.
Well, about 8 years.
Do you have a chart where you check off components that are ready and
done-with? Were are you in your progression?
We have a TODO list, it's not so much a chart of components to
check off, as many things are small problems with components
that are essentially completed, as well as components the we
have yet to add.
There aren't any missing components to have a fully-feature Unix
system any more.
So the core set of features is there?
Yes. And works completely and we have over 400 Debian packages
that are compiled for the Hurd at the current time.
Let's talk a bit about the relationship between the Hurd and Debian.
Debian is the only distribution at this time that includes the Hurd.
That's right. Debian has an distribution architecture to
provide for what Debian calls "ports" - Debian systems using
other kernels or other processors, that works a lot more
flexibly than the other Unix distributions. So it has been
particularly easy.
If a user gets the Debian distribution with the Hurd, what does
he get? A completely functional system? Is it the same as
GNU/Linux?
More or less. The special features that the Hurd has, that Linux
doesn't are available and there are some feature of Linux that
we haven't yet fully implemented. They are mostly minor
things or things that are technically minor but practically
important, like the X-server.
How about all the normal GNU tools and all that good stuff? That's there
and working?
Yes. The Debian distribution has about 4500 packages in the
upcoming potato release (2.2). Of those 1600 work on the Hurd
and 1100 of them are shell script packages of things that don't
require any recompilation. 415 as of my last count were
compiled for the Hurd - and those are the most important of the
maybe 3000 binary packages that one has to consider.
So all the important things are there. Occasionally, there are
bugs in packages that make assumptions that they are running on
Linux and they shouldn't assume that. And there is a certain
amount of work to port packages as a practical matter.
How about the development side. How does it look like developing for
the Hurd, is it similar to Linux?
Well, if you are writing an ordinary sort of user program, it's
very much the same thing.
You have libraries that expose the added functionality of the Hurd?
That's right and you can use as much or as little as you want.
The Hurd functionality in the GNU C library is designed so that
users don't have to turn their whole program into a Hurd
thinking program, they can use one or 2 Hurd functions that is
convenient for them and make the rest just work anywhere.
How about the community? Do you get a lot of input from developers?
Who's working on this stuff? I know at one point there were only 2
people working on the Hurd core project. Is that correct?
There were probably 3 (that were working for the FSF).
That's tiny compared to the community of developers that were working
on the Linux kernel, for example - or even 386BSD.
Not at the time.
And it's also important to note that the BSD development teams
are working on the whole system and not just the kernel. I
don't know about the size of it - Emacs has got only one or 2
people that work on it effectively all the time so the number of
developers, I'm not sure is the the most important thing.
The current size of our mailing list of core developers is about
half a dozen.
How about people that download the package?
I don't have counts of how many downloads there have been. Debian
could probably provide that information, but I don't know what
it would be.
There are a lot more users than core developers.
Are you actively seeking core developers? The Linux community is very
open and there is a lot of energy going on. Is that something you
would like to emulate?
Certainly. One of our mistakes early on was that we didn't
realize the importance of doing this. Right now the current
Hurd source is available via anonymous cvs and developers who
want to participate in the project are welcome to do so. So
it's a matter of people who have talent and interest wanting to
contribute.
So how would they go about that?
We have a public task-list which is available from the Web site
and they should look at that and see if something catches their
fancy.
People can help a lot just by running the system and compiling
as many programs as they can and seeing if there are any
problems - a very important task that just about anybody can do.
Is there a feedback mechanism for bug reports?
Yes. We have a help-list and a bug-list, the way most GNU
packages do.
What are some of the features that need to be worked on?
Well, I described the transparent FTP server. We'd like
something that does a transparent HTTP server, transparent tar,
so that you could "cd" into a tar archive. There are things like
that that we'd like to have. Also more flexible network support
and perhaps a faster network protocol stack. Like I said, what
we have now was just the fastest thing to put together.
One of the most important features is what we call "path search
directories", a directory is a logical union of a whole bunch of
other directories from a specified set - we hope to make that
how system directories like /bin work.
So basically you would apply commands to a whole part of a directory
tree?
No, it's not that so much as you would run a program that would
set it up and it would setup a directory, say /bin, and that
directory wouldn't exist in its own right, instead it would be a
logical union of /usr/bin, /X/bin and whatever else. And then
all of those files in those other directories would appear in
that directory and similarly you could create files in it and
they would appear in one of the other underlying directories.
A symbolic directory of sorts?
Right.
What are some other features that need to be worked on?
We want some of the hardware support that Linux has. There are
categories of hardware that we don't support at all. We do the
basic core kind of things....
Good point... let's talk a bit about hardware support. What's the
state of affairs there?
Well, we crib all our hardware support from Mach, of course, and
Mach (the GNU Mach distribution) can take Linux device
drivers. So any Linux disk driver or any Linux ethernet driver
we can user right away. I suppose that generic SCSI should work
(like CD writers or scanners) too.
But I guess stuff like USB, PCMCIA, stuff like that probably doesn't
work?
Right. There is no PCMCIA support yet. There is no USB
support because there isn't any for Linux really yet, either for
the 2.2 kernels.
Things like audio, video frame grabbers and sound cards and
things like that: there isn't currently a Mach architecture for
them. It wouldn't be necessary be very hard to create it,
but it's one of the tasks that we have to do.
It's not hard, but that's a lot of little things.
Each one is probably fairly small. Mach has character device
support, so it maybe that it will be very trivial.
What is the interface between Mach and the Hurd?. Can you swap out
Mach for example and use at some point some other micro-kernel?
The Hurd itself tends to be fairly Mach-specific, but it doesn't
actually use many of the strange corners of the Mach
protocol. So any micro-kernel which had message-passing and
a similar virtual memory structure would be usable. We would
have to port to it, but it wouldn't be a terribly great amount
of work.
So there are dependences there, but they can be overridden?
Right. Most of the dependences on Mach are entirely trivial and
would be replacing one function and calling a slightly different
one. There are maybe half a dozen files in the Hurd and the GNU
C Library that would require more significant work.`
One of the criticisms of Mach is that it is so large. Is that an issue
at all?
We don't really care about that. Once that was very important
issue, but these days, I just don't think that it's very
important.
Once it's installed, it's there...
That's right.
Are there newsgroups and sites?
Most of that stuff happens on the 2 mailing lists,
bug-hurd@gnu.org, which is for bug reports and developers and
help-hurd@gnu.org mailing list, which is for people who are
having problems, or just want to ask questions.
Is there a lot of activity there?
Yes.
How about web sites?
There are 2 web sites that you have linked to from the Dr.Dobbs
page, there's the Hurd web page specifically
[www.gnu.org/software/hurd.html] about the Hurd
itself and another which is the Debian GNU/Hurd page which is
about the Debian Hurd port, and people who are interesting in
running the system should look at the latter.
Do you think that we will see the Hurd on different distributions? I
guess Debian is very special. You wouldn't be able to just take the
Hurd and port it to be included with Red Hat Linux for example. That's
really Linux centered.
I'm not sure. It depends on which packages are in Red Hat.
Most of it would probably just work. You would need to compile
the Red Hat packages for the Hurd and then most of them would
probably work. There is a certain amount of effort that would
have to happen to make Red Hat use any different kernel. I'm not
familiar enough with Red Hat's source architecture model to say
how easy or how hard that would be.
This project has been around for 8 years. Do you feel any urgency in
getting adopted and used by a larger group of people?
Yes and No.
On the one hand, if I felt a great deal of urgency, I would
have beaten my brains in by now [laughs]. On the other hand, we
always want more users and in fact the Debian GNU Hurd effort
was started because that was sort of the next step in doing
that.
Really, I expect that once we have something that we deem stable
enough to be the next stable Debian release, I would expect to
see a lot more users.
It's more important for you to get it right architecturally,
than get it out.
That's always been my psychology and that can some times be a
problem. The current beta test Debian GNU/Hurd distribution is
there to provide for the other psychology.
Once it is stable and complete, do you think that it will be
compelling enough for people to use - people that probably already
have Linux installed or FreeBSD on their system. What will make
people migrate to the Hurd? [ho-ho]
I don't know. Part of that would require me to know more about
people psychology than I do, and I'm pretty bad at that. People
also tend to be fairly conservative.
We will always get people who are interested in something
different, the group that want to run something different to what
their friends use.
Are their going to be some features that are so compelling that it
really will make sense to migrate over a period of time.
I certainly hope so. I think that some of the transparent file
systems and union file systems will make things look a lot
better for users.
Looking down the road, what time frame are you looking at? What are
some of the milestones ahead?
I'm bad at time frames. I'm not sure if there are any intrinsic
milestones to look for. I can tell you some of the things that
people tracking progress should look for: follow the Debian Hurd
effort, we plan to have a 0.3 release of the Hurd source code
soon. The main motivation for that is so that we can begin work
on some changes to the system that are needed that may
destabilize the code, so we have to create a stable branch.
Most people who try using the system will find that it works but
there is also a system that is still in development and has some
bugs.
Are you intimately involved with the Debian effort?
I have been peripherally involved with Debian for a long time and
I've been getting more strongly associated over the past couple
of years.
They are working on a distribution and you are still working on the
technical development.
Right. They are different things. But I am also interested
in Debian as a distribution and the Hurd is not the only thing
in my life that I pay attention to.
What's going on with you? How much of your time is still spent working
on the Hurd?
It hasn't been as much in the past couple of years (since I have
been working for MIT). I will be going to graduate school in
the fall in Philosophy and that will actually free up more of my
computer-programming time so that I can work on the Hurd much
more intensively.
So how many people are actually working full-time on the core?
I don't know anybody who's doing it full-time.
But I don't know anybody who's working on the Linux kernel
full-time either.
There are a lot of kernel hackers that are working on the Linux kernel.
There are, but I don't know any of them that are doing it 40
hours/week. I'm not sure that that is a very useful measure in
the Free Software world.
A measure might be the number of people participating in the effort,
though.
Right. And we probably have about half a dozen.
It seems to me that there are not a lot of people working on the
project. You show a lot of work to be done.
Sure. I understand what you are looking for, but I'm not sure
that counting man-hours get you much information.
This is an important physical project...
Yes.
But it's a very small project at the same time. Sendmail has dozens
of developers working just on their mail server.
Right. At the same time, I don't think that a useful measure to
count. Emacs and GDB achieved total dominance and were better
than anything else were written by 2 or 3 developers at the
maximum. Different developers work in different ways - and
different projects do.
The GNOME core team was 2 or 3 people - and took over.
And our goal is not even to take over, so I don't think that
it's something that can be easily counted that way. At the same
time, we are always in need of more people who can hack
effectively.
And they can check out your web site for more information.
Absolutely.
Did we forget anything here? Maybe in the technical aspects of the
Hurd. Is there anything else that you would like to point out?
Not that I won't think of until tomorrow [laughs].
How about as far as the project is concerned?
Like I said, a lot of work is still happening and people that
want to help out, even people with no programming expertise can
help a lot just by using the system or by compiling programs
they find and seeing what works and what doesn't.
Thomas, thanks a lot for joining us today.
Thank-you.
And good luck in California.
OK!
Thomas Bushnell is the technical lead for the Hurd project.
[Fade in closing music].
--
Paul.Emsley@chem.gla.ac.uk
http://www.chem.gla.ac.uk/~paule
--
To UNSUBSCRIBE, email to debian-hurd-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: