Re: I'm not a huge fan of systemd
On 7/9/14, Tom Furie <firstname.lastname@example.org> wrote:
> On Tue, Jul 08, 2014 at 07:16:43PM +0100, Brian wrote:
>>  I am allowed to hate curl because I do not use it and do not
>> understand what it does.
> Sounds more like fear. To hate something you must at least be familiar
> with what it is and does, and preferrably know how and why.
I used wget by default for recursive downloads until I could not solve
certain problems, due to known bugs filed about a decade ago. Wget
cannot properly distinguish between certain types of URLs on a page
which you do _not_ want to recurse into, and as such recurses
unnecessarily on some sites, and possibly infinitely on others (not
sure about the latter).
[Apparently there is substantial internal/rewrite work required to fix this.]
Correction **: This may be fixed fixed in 2013:
GNU Wget - Bugs: bug #20364, Regex support for -A -R (-I -X -D)
but this may still pose problems, not sure sorry:
GNU Wget - Bugs: bug #22670, Wget matches accept/reject lists...
AIUI, httrack does not have those problems.
pavuk has also been recommended along with curl.
I have used httrack only a little, and have never gotten as
comfortable with it as I am with wget, but I use it exclusively for
all recursive site downloads, due to the above.
Here is just a sample of the still open bug reports for this
particular "since 2003" problem with wget:
GNU Wget - Bugs: bug #20808, -R should reject files _before_...
wget: Possibility to really reject files on recursive downloads
(23 October 2003)
GNU Wget - Bugs: bug #34855, Wget 1.12: first retrieves...
**: Might be time to do some bug janitoring. Don't wait for me if