On Sun, 31 Jan 2010 20:05:46 +0800, Zhang Weiwu wrote: > $ tidy -q -asxml -utf8 page_07_zh.html | xpath -e > '//div[@class="advertisement"]' exactly. Glad that you found both tidy & libxml-xpath-perl, and solve the problem yourself. -- Tong (remove underscore(s) to reply) http://xpt.sourceforge.net/techdocs/ http://xpt.sourceforge.net/tools/