[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: your mail



Chris Lawrence <quango@ix.netcom.com> wrote:
> A list of all URLs in a particular web page can be
> fairly-easily generated; see e.g. my findnew Python script
> (http://www.linux-m68k.org/py/findnew.py) which does this very thing
> as part of its processing.

wget deals with sets of web pages.

At the moment, it's mostly remote web pages (unless you're using
-nc in which case local copies take precedence).

The problem I was trying to solve involves mirroring an archive
where only a small number of pages actually change.  -nc isn't
quite right because pages with links to the new pages get changed.
-m isn't quite right because it re-scans the entire archive.
-R and -A aren't the rule probably is something like download
a page if it hasn't been seen before, or if it's an updated
copy of something which has been downloaded recently [or if it
was explicitly listed at startup].

[I'd also like to be able to specify acceptence or rejection
based on mime type.  As in: don't bother downloading anything
that's not Text/something.  But that's another issue entirely.]

-- 
Raul


--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: