[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#203498: ITP: decss -- utility for stripping CSS tags from an HTML page.

On Thursday 31 July 2003 11:27, Sam Hocevar wrote:
>    And HTML makes it even harder since very few pages are valid, but
> that DeCSS utility uses only regexes anyway.

Technically, using RegExps for CSS will not only become maintenance hell, but 
would also limit the usability of such a script for e.g. network 
If at all, the way to go would be to use a decent HTML parser library (khtml, 
gecko come to mind, even Python's htmlparser is not mature enough yet), which 
not only gives the (internal, external) stylesheet but all components of the 
DOM and whatnot, and use scripting facilities to modify this object, and dump 
the resulting modified object to e.g. stdout.
'HTML' and 'leightweight' will hardly fit together.


Play for fun, win for freedom.
Hurd^H^H^H^HLinux-Info-Tag Dresden 2003: http://www.linux-dresden.de

Reply to: