Package: wnpp
Severity: wishlist
* Package name : htmlcxx
Version : 0.83
Upstream Author : Davi Reis <davi.reis@gmail.com>
* URL : http://htmlcxx.sourceforge.net/
* License : LGPL
Programming Lang: C++
Description : htmlcxx is a simple non-validating html parser library for C++
htmlcxx is a simple non-validating css1 and html parser for C++. Although
there are several other html parsers available, htmlcxx has some
characteristics that make it unique:
* STL like navigation of DOM tree, using excelent's tree.hh library from
Kasper Peeters
* It is possible to reproduce exactly, character by character, the
original document from the parse tree
* Bundled css parser
* Optional parsing of attributes
* C++ code that looks like C++ (not so true anymore)
* Offsets of tags/elements in the original document are stored in the
nodes of the DOM tree
The parsing politics of htmlcxx were created trying to mimic mozilla
firefox (http://www.mozilla.org) behavior. So you should expect parse trees
similar to those create by firefox. However, differently from firefox,
htmlcxx does not insert non-existent stuff in your html. Therefore,
serializing the DOM tree gives exactly the same bytes contained in the
original HTML document.
--
http://syx.googlecode.com - Smalltalk YX
http://lethalman.blogspot.com - Thoughts about computer technologies
http://www.debian.org - The Universal Operating System
Attachment:
signature.asc
Description: Digital signature