[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I'm not a huge fan of systemd

On 7/9/14, Tom Furie <tom@furie.org.uk> wrote:
> On Tue, Jul 08, 2014 at 07:16:43PM +0100, Brian wrote:
>> [1] I am allowed to hate curl because I do not use it and do not
>> understand what it does.
> Sounds more like fear. To hate something you must at least be familiar
> with what it is and does, and preferrably know how and why.

I used wget by default for recursive downloads until I could not solve
certain problems, due to known bugs filed about a decade ago. Wget
cannot properly distinguish between certain types of URLs on a page
which you do _not_ want to recurse into, and as such recurses
unnecessarily on some sites, and possibly infinitely on others (not
sure about the latter).

[Apparently there is substantial internal/rewrite work required to fix this.]
Correction **: This may be fixed fixed in 2013:
GNU Wget - Bugs: bug #20364, Regex support for -A -R (-I -X -D)

but this may still pose problems, not sure sorry:
GNU Wget - Bugs: bug #22670, Wget matches accept/reject lists...

AIUI, httrack does not have those problems.
pavuk has also been recommended along with curl.

I have used httrack only a little, and have never gotten as
comfortable with it as I am with wget, but I use it exclusively for
all recursive site downloads, due to the above.

Here is just a sample of the still open bug reports for this
particular "since 2003" problem with wget:

GNU Wget - Bugs: bug #20808, -R should reject files _before_...

wget: Possibility to really reject files on recursive downloads
(23 October 2003)

GNU Wget - Bugs: bug #34855, Wget 1.12: first retrieves...

**: Might be time to do some bug janitoring. Don't wait for me if
you're innclined.

Reply to: