Package: wnpp Severity: wishlist * Package name : htmlcxx Version : 0.83 Upstream Author : Davi Reis <davi.reis@gmail.com> * URL : http://htmlcxx.sourceforge.net/ * License : LGPL Programming Lang: C++ Description : htmlcxx is a simple non-validating html parser library for C++ htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique: * STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters * It is possible to reproduce exactly, character by character, the original document from the parse tree * Bundled css parser * Optional parsing of attributes * C++ code that looks like C++ (not so true anymore) * Offsets of tags/elements in the original document are stored in the nodes of the DOM tree The parsing politics of htmlcxx were created trying to mimic mozilla firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by firefox. However, differently from firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document. -- http://syx.googlecode.com - Smalltalk YX http://lethalman.blogspot.com - Thoughts about computer technologies http://www.debian.org - The Universal Operating System
Attachment:
signature.asc
Description: Digital signature