[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#781380: ITP: libhtml-gumbo-perl -- HTML5 parser based on gumbo C library



Package: wnpp
Severity: wishlist
Owner: Onur Aslan <onur@onur.im>

* Package name    : libhtml-gumbo-perl
  Version         : 0.13
  Upstream Author : Alex Vandiver <alex@chmrr.net>
* URL             : https://metacpan.org/pod/HTML::Gumbo
* License         : Artistic or GPL-1+
  Programming Lang: Perl
  Description     : HTML5 parser based on gumbo C library


Gumbo is an implementation of the HTML5 parsing algorithm implemented as
a pure C99 library with no outside dependencies.

Goals and features of the C library:

 * Fully conformant with the HTML5 spec.
 * Robust and resilient to bad input.
 * Simple API that can be easily wrapped by other languages. (This is one
   of such wrappers.)
 * Support for source locations and pointers back to the original text.
   (Not exposed by this implementation at the moment.)
 * Relatively lightweight, with no outside dependencies.
 * Passes all html5lib-0.95 tests.
 * Tested on over 2.5 billion pages from Google's index.


This is my favorite HTML parser after HTML::Parser. gumbo-parser package is
in new packages queue right now[1]. I will start packaging this after it
gets accepted.

[1]: https://ftp-master.debian.org/new/gumbo-parser_0.9.2+dfsg-1.html

Attachment: signature.asc
Description: Digital signature


Reply to: