Re: your mail
- To: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>
- Subject: Re: your mail
- From: Raul Miller <rdm@test.legislate.com>
- Date: Sun, 17 May 1998 00:23:24 -0400
- Message-id: <[🔎] 19980517002324.W3613@test.legislate.com>
- Mail-followup-to: Hrvoje Niksic <hniksic@srce.hr>, Joost Kooij <kooij@mpn.cp.philips.com>, debian-devel@lists.debian.org, Nicols Lichtmaier <nick@feedback.net.ar>, James Troup <J.J.Troup@scm.brad.ac.uk>
- In-reply-to: <[🔎] 19980516222612.A19244@ix.netcom.com>; from Chris Lawrence on Sat, May 16, 1998 at 10:26:12PM -0500
- References: <[🔎] 19980516222612.A19244@ix.netcom.com>
Chris Lawrence <quango@ix.netcom.com> wrote:
> A list of all URLs in a particular web page can be
> fairly-easily generated; see e.g. my findnew Python script
> (http://www.linux-m68k.org/py/findnew.py) which does this very thing
> as part of its processing.
wget deals with sets of web pages.
At the moment, it's mostly remote web pages (unless you're using
-nc in which case local copies take precedence).
The problem I was trying to solve involves mirroring an archive
where only a small number of pages actually change. -nc isn't
quite right because pages with links to the new pages get changed.
-m isn't quite right because it re-scans the entire archive.
-R and -A aren't the rule probably is something like download
a page if it hasn't been seen before, or if it's an updated
copy of something which has been downloaded recently [or if it
was explicitly listed at startup].
[I'd also like to be able to specify acceptence or rejection
based on mime type. As in: don't bother downloading anything
that's not Text/something. But that's another issue entirely.]
--
Raul
--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: