Theodore Ts'o wrote:
> The original code which was contributed to me by Andreas Dilger. It
> uses a hand-coded XML parser which is quite small (2.7k of text in a
> shared library). In contrast, libxml2.so is 610k text, 43k data, and
> 3.5k BSS, which just boggles the mind.
There is probably a lot of stuff in XML that Andreas' code doesn't
support. I also have written my own subset-of-XML parser for a specific
application, and it also is quite small. If all you want to do is handle
simple, self-contained documents that consist of tags with attributes
and PCDATA, for example:
<tag1 att1='foo' att2='bar'>
<tag2 att3='baz'/>
<tag3 att4='quux'>pcdata</tag3>
</tag1>
then it's not much work at all to tokenize that and generate either
SAX-style callback events for each element, or a DOM-like tree of
element objects. It's when you get into all the other things that XML
defines, like a wide variety of character codings, external entities,
DTD parsing and validation, etc., etc., etc. that your code base starts
getting pretty big.
I admit, though, 610k of text still seems pretty huge.
Craig
Attachment:
pgp5qnQvnYpTM.pgp
Description: PGP signature