One of the first
things any data analyst learns while working with tones of documents
(html, pdf texts, ...) is that there are always edge cases which are
not fully syntactically reducible, that you must eyeball, you can't
safely deal with them with code.
Once you have all
URLs of edge cases, the best safe method (using neither browser
_javascript_-based "addons", nor plain _javascript_) is simply
opening each page at a time and remove the browser cache ideally each
time. You would go like:
_URLs="<text
file with lines of URLs>"
# it restart firefox
everytime
xargs -n1 ./firefox
-new-tab < "${_URLs}"
but in this way you
start an instance every time. I need to:
1) open a number of
tabs at once;
2) once I close the
last one ff should clear the cache or shut down and re start
It shouldn't be so
hard, but I haven't figured it out yet.
lbrtchx