[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: remove an HTML tag and all its children from commandline



On Sun, 31 Jan 2010 10:54:46 +0800, Zhang Weiwu wrote:

> I want to remove all advertisements in my 100 html files. They are
> pretty neatly classed, like the following:
> 
> <div class="advertisement">
> ...
> </div>
> 
> However I could not simply do this:
> s/<div class="advertisement">.*</div>//
> 
> Because it is too greedy

For not-so-simple tasks, you need not-so-simple tools. Depending on how 
much time you'd like to investigate into such not-so-simple tools, take a 
look at libwwww?, sgrep or the xpath language. 

HTH

-- 
Tong (remove underscore(s) to reply)
  http://xpt.sourceforge.net/techdocs/
  http://xpt.sourceforge.net/tools/


Reply to: