[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: remove an HTML tag and all its children from commandline



On Sun Jan 31, 2010 at 10:54:46 +0800, Zhang Weiwu wrote:

> I want to remove all advertisements in my 100 html files. They are
> pretty neatly classed, like the following:
>
> <div class="advertisement">
> ...
> </div>

  You might enjoy my "html-tool" command which would do the
 job for you via:

    html-tool --cut-class=advertisement --file input.html

  You can get it via:

    wget http://mybin.repository.steve.org.uk/raw-file/tip/html-tool

  Or via the repository at:

    http://mybin.repository.steve.org.uk/

  See here for some brief discussion:

    http://blog.steve.org.uk/oh__this_should_be_stunning_.html

  Internally it uses the XPath perl module HTML::TreeBuilder::Xpath,
 but the details probably don't matter.

Steve
--


Reply to: