[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#639715: ITP: libjsoup-java -- Java HTML parser that makes sense of real-world HTML soup



Package: wnpp
Severity: wishlist
Owner: Torsten Werner <twerner@debian.org>

* Package name    : libjsoup-java
  Version         : 1.6.1
  Upstream Author : Jonathan Hedley
* URL             : http://jsoup.org/
* License         : MIT
  Programming Lang: Java
  Description     : Java HTML parser that makes sense of real-world HTML soup

 Jsoup is a Java library for working with real-world HTML. It provides a very
 convenient API for extracting and manipulating data, using the best of DOM,
 CSS, and jquery-like methods.
 .
 jsoup implements the WHATWG HTML specification (http://whatwg.org/html), and
 parses HTML to the same DOM as modern browsers do.
 .
   * parse HTML from a URL, file, or string
   * find and extract data, using DOM traversal or CSS selectors
   * manipulate the HTML elements, attributes, and text
   * clean user-submitted content against a safe white-list, to prevent XSS
   * output tidy HTML
 .
 jsoup is designed to deal with all varieties of HTML found in the wild; from
 pristine and validating, to invalid tag-soup; jsoup will create a sensible
 parse tree.

Jsoup is a build dependency for release 1.0 of wagon.

Torsten



Reply to: