Re: Patch to tidy up wget use, and add return=4 when 404's are seen

To: Frans Pop <elendil@planet.nl>
Cc: debian-boot@lists.debian.org
Subject: Re: Patch to tidy up wget use, and add return=4 when 404's are seen
From: Philip Hands <phil@hands.com>
Date: Thu, 06 Mar 2008 01:15:09 +0000
Message-id: <[🔎] 47CF459D.6080203@hands.com>
In-reply-to: <[🔎] 200803060042.34152.elendil@planet.nl>
References: <20080216142256.GH30909@hands.com> <[🔎] 200803050741.22511.elendil@planet.nl> <[🔎] 47CF1849.2020709@hands.com> <[🔎] 200803060042.34152.elendil@planet.nl>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frans Pop wrote:
> On Wednesday 05 March 2008, Philip Hands wrote:
>> The reason I need this is in preseeding scripts, where one can then
>> write code that says:
> 
> Right, that seems like a valid use case.

Glad you liked it :-)

>> If what you're really saying is that you don't understand it, that's OK,
> 
> Yes, that is what I'm saying: I'm unable to judge the impact/risks myself.
> 
>> I've obviously not been clear enough in the README -- Perhaps you could
>> tell which bit's not sufficiently explained and I'll try again.
> 
> No, that's not the problem. The problem is my lack of background here.
> The code is a bit over my head and just "looks" a bit fragile to me. It's on 
> a somewhat lower technical level than what we usually do in shell script 
> and that makes me a bit uncomfortable.
> 
> If others can ack your code, then it would be perfectly fine with me.
>
> Another possible solution could be to have an option to quickly fall back to 
> a simpler version if we see breakage. For example: do this change in two 
> stages: first stage would be to move existing fetch_url functions to 
> di-utils, second stage to implement wget404. That way we could revert the 
> second stage if needed with less risk of conflicts, without needing to 
> change/upload multiple components and without having to change 
> dependencies.
> The two stages would only have to be separate commits, there would not have 
> to be time between them.

Yeah, that should be no problem -- If I take out all the echos and the
sed, I think you get exactly what's needed for the 404-free version of
wget404 (if you know what I mean) -- I'll take a look tomorrow.

...
> What I'm a bit worried about here is seeing huge pauses from a user PoV on 
> wget failures because of all the retries acting as a multiplier of wget's 
> internal timeouts...

Yeah, especially since it'll perhaps be trying as many as 5 times as
often -- very good point.

> I'd really like to hear Joey's take on this. He's probably the best judge to 
> decide between keeping what was implemented or changing to what was 
> intended.

Agreed.

...
> The main problem here of course is that it is almost impossible to trace a 
> report of an installation problem to the use of wget -c. It would certainly 
> not be something that would pop into my mind if a user reported that his 
> installation failed because file X failed to download or was corrupted.

Yeah, good point -- and during testing this I did note that wget will
cheerfully slap the second half of a different file onto a previous
download if you do something like:

  wget url_for_short_file /tmp/foo
...
  wget -c url_for_longer_file /tmp/foo

which is why I put the rm into preseed_fetch (as I have no idea if
people might be relying on preseed_fetch's ability to overwrite the
destination file)

> Again, I find it really hard to judge the risks and pros and cons here.

Likewise -- part of my motivation in discussing this patch was to tease
out what was actually intended in the first place, as it's not
completely clear given things like the not-a-loop loop.

> Maybe it's better to keep things simple. If the user really has a flaky 
> network connection, maybe it's even better to have him notice that early in 
> the installation rather than during debootstrap or tasksel. I have no idea 
> what e.g. debootstrap does when it comes to retrying.
> 
> [/me checks debootstrap's code]
> Hmmm. This is illuminating. debootstrap's functions library has this in the 
> just_get function (wgetprogress calls wget):
>     if wgetprogress -O "$dest" "$from"; then
>         return 0
>     elif [ -s "$dest" ]; then
>         local iters=0
>         while [ "$iters" -lt 3 ]; do
>             warning RETRYING "Retrying failed download of %s" "$from"
>             if wgetprogress -c -O "$dest" "$from"; then break; fi
>                 iters="$(($iters + 1))"
>             done
>     else
>         rm -f "$dest"
>         return 1
>    fi
> 
> Guess it would make a lot of sense to emulate that. Note that it does not do 
> any full retries, but only a repeated continuation retry.
> 
> Care to take a closer look at that yourself too?

Will do.

One thing I've not really got my head around yet is how wgetprogress
actually communicates with the progress bar stuff -- it did occur to me
that we could perhaps use wgetprogress in fetch-url and then use
fetch-url in debootstrap, but I've not managed to work out if that would
then give us the ability to have progress bars for when downloading
preseeding stuff, or simply break in that situation.  (and I've
definitely _not_ worked out what would need to be done in the way of
insane redirects ;-)

> What I'm starting to think is that maybe we should have two retry loops: one 
> default continuation retry loop as in debootstrap, and an optional full 
> retry loop (if a -r option is passed) that does the full retry, probably 
> only twice.
>
> I'm thinking that net-retriever probably should /not/ use that -r option 
> (which would match current actual behavior), but maybe it would be useful 
> in other cases.
[and from later mail]
> Or maybe use a switch to toggle between "continuation" and "full" retries. 
> Continuation retries probably make sense for larger files (Packages comes 
> to mind) for which the integrity can be checked (md5sum).
> Given the caveats in the wget manpage for -c, it may be safer to not do 
> continuation retries at all, but just do a full retry in other cases (like 
> fetching preseed files).

> Does that make sense?

Definitely.

I should have time to knock-up something along those lines tomorrow.  It
seems to make sense to have the ability to both switch continuation on
(with the default being off) and to optionally retry.

It strikes me that the continuation thing is something where you know
that you have a big file that you're going to checksum anyway, so are
willing to risk a continuation.

For the retries, I'd have thought that that was related to
interactiveness, so we could perhaps check one of the existing ways of
indicating that you're trying to do a mostly non-interactive install,
and only do the retries if that's the case -- for the extreme case of a
fully automated install I'd much rather it takes ages and retries
several times, if it works, than see that a network outage broke the
install.  Whereas, a normal newbie install should definitely whine about
a broken network as soon as it's noticed.  Any suggestions on which
flag(s) to check to decide the number of retries?

Cheers, Phil.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHz0WdYgOKS92bmRARAqlIAJ4kXnpmdXCoLDjaP+Xyp/uWXbAhHACgiGiq
vbHCwQjezYp4KM29ehWlko4=
=6BfR
-----END PGP SIGNATURE-----

Reply to:

Follow-Ups:
- Re: Patch to tidy up wget use, and add return=4 when 404's are seen
  - From: Frans Pop <elendil@planet.nl>

References:
- Re: Patch to tidy up wget use, and add return=4 when 404's are seen
  - From: Frans Pop <elendil@planet.nl>
- Re: Patch to tidy up wget use, and add return=4 when 404's are seen
  - From: Philip Hands <phil@hands.com>
- Re: Patch to tidy up wget use, and add return=4 when 404's are seen
  - From: Frans Pop <elendil@planet.nl>

Prev by Date: Re: Patch to tidy up wget use, and add return=4 when 404's are seen
Next by Date: Re: [RFC] Column alignment in debconf {multi,}selects, take 2
Previous by thread: Re: Patch to tidy up wget use, and add return=4 when 404's are seen
Next by thread: Re: Patch to tidy up wget use, and add return=4 when 404's are seen
Index(es):
- Date
- Thread