[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Patch to tidy up wget use, and add return=4 when 404's are seen

Hash: SHA1

Frans Pop wrote:
> I'm still not sure whether this qualifies as a simplification or not :-/

I'd say it's a simplification in that it takes the tangly wget code from
a couple of places and makes sure there only needs to be one copy -- it
would be nice to work out a way of also moving the wgetprogress code in
here so that a) it doesn't need to be elsewhere, and b) we get progress
bars on all wget downloads, but I was leaving that as a phase 2 project.

I also see the potential for other methods (like nfs for the kickseed
udeb) to be done via fetch-url, which would make the kickseed code
simpler, and provide nfs support wherever we currently have http
support, so the potential for simplification is significantly larger
than demonstrated by this initial patch.

and from the other mail:
> Also, where is the change that actually tests for a return value of 4?
> Or, in other words, how hard do we really need this?

The reason I need this is in preseeding scripts, where one can then
write code that says:

  grab the class file for class X, then if it exists also grab the
  local override file for class X -- if either download fails, halt
  with an error, but if the local file doesn't exist, don't worry

I'm sure there are cases where similar code might be written within d-i
as well, but didn't want to start looking for such cases until we had a
decent patch sorted out.

> Cheers,
> On Saturday 01 March 2008, Philip Hands wrote:
>> +++ packages/debian-installer-utils/debian/rules	2008-02-29
>> 21:48:43.000000000 +0000 @@ -40,13 +40,14 @@
> [...]
>> +	dh_link -p di-utils /usr/lib/fetch-url/http /usr/lib/fetch-url/ftp
> Looks like without leading / is preferred.

Fair enough.

>> +++ packages/debian-installer-utils/fetch-url-methods/http	2008-03-01
>> 13:42:40.000000000 +0000 @@ -0,0 +1,38 @@
>> +protocol_fetch() {
>> +	local url="$1"
>> +	local file="$2"
>> +
>> +	wget404() {
>> +	# wrapper for wget that returns 4 for 404 errors while preserving
>> +	# see README.wget404 in the debian-installer-utils udeb source for more
>> info about this
> Second comment can be shortened to:
>    see also README.wget404 in debian-installer-utils source package


>> +		local RETVAL=$( {  
>> +			echo 1
>> +			wget "$@" 2>&1 >&3 && echo %OK%
>> +			echo %EOF%
>> +			} | ( sed -ne '1{h;d};/server returned error
>> 404/{p;s/.*/4/;h;d};/^%OK%$/{s/.*/0/;h;d};$!p;$x;$w /dev/fd/4' >&2 ) 4>&1
>> +		) 3>&1
>> +		echo "wget404 returns $RETVAL" >> /tmp/fetch-url.log
>> +		return $RETVAL
>> +	}
> /me somewhat wonders how robust this is and whether it is safe on all 
> architectures and also for other kernels...
> Especially the writing to /dev/fd/4 scares me.

Erm, why?  simply because you don't see to many people using new file
handles in shell?  I'm assuming that you're not saying that we have
architectures where only 3 file handles are available.

Is busybox really so flakey as to have different sed behaviour on
different archs?  That's a little scary if true.  Do you have some examples?

If what you're really saying is that you don't understand it, that's OK,
I've obviously not been clear enough in the README -- Perhaps you could
tell which bit's not sufficiently explained and I'll try again.

>> +	# use the proxy for wgets (should speed things up)
>> +	if db_get mirror/$proto/proxy; then
>> +		export ${proto}_proxy="$RET"
>> +	fi
>> +
>> +	for i in 1 2 3; do
>> +		if [ -e "$file" ]; then
>> +                        # busybox wget can sometimes become confused
>> +                        # while resuming, so if it fails, restart
>> +                        if wget404 -q -c "$url" -O "$file"; then
>> +                                return 0
>> +                        fi
>> +		fi
>> +
>> +		wget404 -q "$url" -O "$file"
>> +		local RET=$?
> So basically you now first try a restart and if that fails a new full 
> download and that three times, while in the old code we only did the full 
> retry.

Well, sort-of -- in the old code, this was only used for preseed_fetch
which did as you say.  In the new code, preseed_fetch wipes out the
target file before we get here (I'm not sure if that's really needed,
but it preserves the old behaviour most closely) and then does the retries.

On the other hand, the code that's no longer needed in net-retriever
used to look like it was trying to do the above but actually implemented:

  If the file does not exist, try one wget and return with the result.

  If it does exist, try a wget -c.
  If that fails, try a wget and return with it's result.

Perhaps that's what was intended, given that the continuation branch
seems to imply that there are loops at a higher level deciding to retry
the download via net-retriever (presumably anna is willing to retry

So, perhaps we need a -c option for fetch-url to indicate that it should
use wget -c, and have that option specified when calling fetch-url from
net-retriever -- I certainly don't see it being particularly useful in
preseeding most of the time, although for people who start grabbing
things like custom kernels using preseed_fetch it might prove helpful to
have a continuation retry.

How wise it is to do the alternating normal and -c attempts I'm not
sure, but presumably that's already itself proved relatively harmless in
the net-retriever version of the code.

> I'm not sure if that is wise, given the many caveats listed in the wget 
> manpage for the -c option...

Me neither, but that's how it is in net-retriever, and I'd have thought
that trying to change that as little as possible makes sense, given that
it's pretty vital to the install.

>> +++ packages/net-retriever/net-retriever	2008-02-29 
>> @@ -16,36 +16,11 @@
>>  hostname="$RET"
>>  db_get mirror/$protocol/directory
>>  directory="$RET"
>> -db_get mirror/$protocol/proxy
>> -proxy="$RET"
>> -if [ -n "$proxy" ]; then
>> -	if [ "$protocol" = http ]; then
>> -		export http_proxy="$proxy"
>> -	elif [ "$protocol" = ftp ]; then
>> -		export ftp_proxy="$proxy"
>> -	fi
>> -fi
> Huh? Aren't we losing all support of the proxy here?

No -- fetch-url deals with proxies already -- see the lines in

        # use the proxy for wgets (should speed things up)
        if db_get mirror/$proto/proxy; then
                export ${proto}_proxy="$RET"

>> +++ packages/preseed/preseed_fetch	2008-03-01 09:21:03.000000000 +0000
>> +# Am suspicious that there are assumptions that we can overwrite 
>> +# in preseeding, so lets discard old files for now
>> +[ -e $dst ] && rm $dst
> This can just be
> rm -f $dst

True -- I tend to do the test first to avoid unnecessary forks, but I'm
guessing that rm is a builtin for busybox sh -- Looks like it:

With busybox we get:
~ $ type [
[ is [
~ $ type rm
rm is rm

Whereas with bash it's:
phil@palm:~$ type [
[ is a shell builtin
phil@palm:~$ type rm
rm is /bin/rm

Cheers, Phil.
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


Reply to: