[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#870077: ITP: html5-parser -- necessary dependency for calibre



Package: wnpp
Severity: wishlist
Owner: Norbert Preining <preining@debian.org>

* Package name    : html5-parser
  Version         : 0.4.3
  Upstream Author : Kovid Goyal <kovid@kovidgoyal.net>
* URL             : https://github.com/kovidgoyal/html5-parser
* License         : Apache
  Programming Lang: Python
  Description     : necessary dependency for calibre

A fast implementation of the HTML 5 parsing spec for Python. Parsing is
done in C using a variant of the gumbo parser. The gumbo parse tree is
then transformed into an lxml tree, also in C, yielding parse times that
can be a thirtieth of the html5lib parse times. That is a speedup of 30x.
This differs, for instance, from the gumbo python bindings, where the
initial parsing is done in C but the transformation into the final
tree is done in python.

Will be maintained in collab-maint.


Reply to: