[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#806369: RFP: html5tidy -- “tidy” HTML 5 in the wild to well-formed XML or HTML



Package: wnpp
Severity: wishlist

* Package name    : html5tidy
  Version         : git master
  Upstream Author : Michael Murtaugh & The active archives contributors
* URL             : https://github.com/aleray/html5tidy.git
* License         : GPLv3+
  Programming Lang: Python 2
  Description     : “tidy” HTML 5 in the wild to well-formed XML or HTML

Since tidy fails hard on many HTML 5 documents (e.g. zero output)
this package can be used to transform in-the-wild HTML 5 documents
to input xmlstarlet can actually act on, e.g. for data extraction
with XPath and XSLT via “xmlstarlet sel”.


Reply to: