[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFP: LXR -- Linux Cross-Reference, web-based C source cross referencer

Package: wnpp
Severity: wishlist

(seems to be currently down - see google cache at


The Linux Cross-Reference project is the testbed application of a
general hypertext cross-referencing tool. (Or the other way around.)

The main goal of the project is to create a versatile
cross-referencing tool for relatively large code repositories. The
project is based on stock web technology, so the codeview client may
be chosen from the full range of available web browsers. On the server
side, the prototype implementation is based on an Apache web server,
but any Unix-based web server with cgi-script capability should do
nicely. (The prototype implementaion is running on a dual Pentium Pro
Linux box.)

The main feature of the indexer is of course the ability to jump
easily to the declaration of any global identifier. Indeed, even all
references to global identifiers are indexed. Quick access to function
declarations, data (type) definitions and preprocessor macros makes
code browsing just that tad more convenient. At-a-glance overview of
e.g. which code areas that will be affected by changing a function or
type definition should also come in useful during development and

Other bits of hypertextual sugar, such as e-mail and include file
links, are provided as well, but is on the whole, well, sugar. Some
minimal visual markup is also done. (Style sheets are considered as a
way to do this in the future.)


The index generator is written in Perl and relies heavily on Perl's
regular expression facilities. The algorithm used is very brute force
and extremely sloppy. The rationale behind the sloppiness is that too
little information renders the database useless, while too much
information simply means the users have to think and navigate at the
same time.

The Linux source code, with which the project has initially been
linked, presents the indexer with some very tough obstacles.
Specifically, the heavy use of preprocessor macros makes the parsing a
virtual nightmare. We want to index the information in the
preprocessor directives as well as the actual C code, so we have to
parse both at once, which leads to no end of trouble. (Strict parsing
is right out.) Still, we're pretty satisfied with what the indexer
manages to get out of it.

There's also the question of actually broken code. We want to
reasonably index all code portions, even if some of it is not entirely
syntactically valid. This is another reason for the sloppiness.

There are obviously disadvantages to this approach. No scope checking
is done, and the most annoying effect of this is mistaking local
identifers for references to global ones with the same name. This
particular problem (and others) can only be solved by doing (almost)
full parsing. The feasibility of combining this with the fuzzy way
indexing is currently done is being looked into.

An identifier is a macro, typedef, struct, enum, union, function,
function prototype or variable. For the Linux source code between
50000 and 60000 identifiers are collected. The individual files of the
sourcecode are formatted on the fly and presented with clickable

It is possible to search among the identifiers and the entire kernel
source text. The freetext search is implemented using Glimpse, so all
the capabilities of Glimpse are available. Especially the regular
expression search capabilities are useful.

unix, linux, debian, networks, security, | Stay the patient course
kernel, TCP/IP, C, perl, free software,  | Of little worth is your ire
mail, www, sw devel, unix admin, hacks.  | The network is down

To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: