Re: remove an HTML tag and all its children from commandline
On Sun Jan 31, 2010 at 10:54:46 +0800, Zhang Weiwu wrote:
> I want to remove all advertisements in my 100 html files. They are
> pretty neatly classed, like the following:
>
> <div class="advertisement">
> ...
> </div>
You might enjoy my "html-tool" command which would do the
job for you via:
html-tool --cut-class=advertisement --file input.html
You can get it via:
wget http://mybin.repository.steve.org.uk/raw-file/tip/html-tool
Or via the repository at:
http://mybin.repository.steve.org.uk/
See here for some brief discussion:
http://blog.steve.org.uk/oh__this_should_be_stunning_.html
Internally it uses the XPath perl module HTML::TreeBuilder::Xpath,
but the details probably don't matter.
Steve
--
Reply to: