[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#959518: apt-transport-http: Repeatable 'Undetermined Error' during package download from snapshot.debian.org



Hi,

On Mon, 14 Sep 2020 18:50:44 +0200 Julian Andres Klode <jak@debian.org> wrote:
> On Mon, Sep 14, 2020 at 05:18:20PM +0100, James Addison wrote:
> > Package: snapshot.debian.org
> > Followup-For: Bug #959518
> > X-Debbugs-Cc: jay@jp-hosting.net
> > 
> > The issue appears reproducible at the moment with apt 1.8.2.1 compiled from source and the 'x.tar' configuration provided earlier.
> > 
> > # apt source directory, post-build
> > $ cmdline/apt -o Dir=$PWD/x -o Dir::Bin::Methods=$PWD/methods update && \
> > $ cmdline/apt -o Dir=$PWD/x -o Dir::Bin::Methods=$PWD/methods install -y openjdk-11-jdk
> > 
> > ...
> > 
> > Get:261 http://snapshot.debian.org/archive/debian-security/20200502T085134Z buster/updates/main amd64 openjdk-11-jdk-headless amd64 11.0.7+10-3~deb10u1 [215 MB]
> > Err:261 http://snapshot.debian.org/archive/debian-security/20200502T085134Z buster/updates/main amd64 openjdk-11-jdk-headless amd64 11.0.7+10-3~deb10u1
> >   Undetermined Error [IP: 193.62.202.27 80]
> > 
> > This has occurred for a couple of different server IP addresses, including 185.17.185.185.
> 
> We only care about unstable for this bug. There is a whole bunch of
> changes in http code and they won't be backported to stable releases.
> 
> Also, the previous comment by Alex Thiessen indicated that this is not a
> bug in apt, but the server seems to close the connection, which means
> there is nothing actionable here.
> 
> If you can produce an issue with the version of apt in unstable,
> and it does not reproduce with wget or curl, please open a new bug report for
> it.

I'm very familiar with snapshot.d.o from the client perspective. Julian is
correct, that it's the server closing the connection. But that doesn't mean
that it's not at least a wishlist bug or feature request in apt. Let me explain
a bit more.

For several projects (debrebuild, debbisect, buildprofile QA,
bootstrap.debian.net...) I regularly interact with snapshot.d.o. Doing this
plainly with apt is deemed to fail miserably with errors like:

# E: Failed to fetch [...]  Error reading from server. Remote end closed connection
# E: Failed to fetch [...]  Hash Sum mismatch
# E: Failed to fetch [...]  Bad header line Bad header data
# Err:118 [...] Connection timed out

Yes, this is because of how snapshot.d.o throttles connections. For example
without additional measures, the following will fail:

$ curl http://snapshot.debian.org/archive/debian/20200909T084102Z/pool/main/q/qtwebengine-opensource-src/qtwebengine-opensource-src_5.14.2+dfsg1.orig.tar.xz >/dev/null
curl: (18) transfer closed with 217347024 bytes remaining to read

There are a couple of things that can be done to work around this problem when
using curl by adding options like:

    --limit-rate=800k # this has the biggest effect
    --retry 10 --retry-connrefused
    --resolve snapshot.debian.org:80:193.62.202.27

But even those are not sufficient as snapshot.d.o will also cut the connection
early enough such that curl will fail with "network unreachable" which is not a
transient error, so curl will not retry establishing the connection.

The only thing that reliably worked for me with snapshot.d.o was the pycurl
based Python code at the end of this E-Mail. With that code, I can even
download for a full day reliably from snapshot.d.o without ever having hit the
Exception in the last line.

But as things stand, it is impossible to reliably use apt together with
snapshot.d.o. I'm not sure how to solve this problem. One way could surely be
to approach snapshot.d.o and ask them to somehow lift their very heavy
throttling policies. But another way to solve this problem would be to make apt
more resilient about mirrors with heavy throttling policies. I can think of
these wishlist bugs against apt:

 - allow to specify a maximum bytes per second value for downloads (this has
   the largest effect if set low enough)

 - allow to set an option that makes apt automatically retry when a transient
   error occurs

 - allow to set custom resolve addresses for domains like done in my code below

I'm not saying that we shouldn't look into maybe making snapshot.d.o throttle
less, because as things stand, it's impossible to use it together with apt. But
there certainly also some things that apt can do and which will not only
benefit people working with snapshot.d.o but also people who are otherwise
using a mirror or proxy with heavy throttling.

Thanks!

cheers, josch





def download(url):
    f = BytesIO()
    maxretries = 10
    for retrynum in range(maxretries):
        try:
            c = pycurl.Curl()
            c.setopt(
                c.URL, url,
            )
            # even 100 kB/s is too much sometimes
            c.setopt(c.MAX_RECV_SPEED_LARGE, 800 * 1024)  # bytes per second
            c.setopt(c.CONNECTTIMEOUT, 30)  # the default is 300
            # sometimes, curl stalls forever and even ctrl+c doesn't work
            start = time.time()
            def progress(*data):
                # a download must not last more than 5 minutes
                # with 100 kB/s this means files cannot be larger than 31MB
                if time.time() - start > 5 * 60:
                    print("transfer took too long")
                    return 0
            c.setopt(pycurl.NOPROGRESS, 0)
            c.setopt(pycurl.XFERINFOFUNCTION, progress)
            # $ host snapshot.debian.org
            # snapshot.debian.org has address 185.17.185.185
            # snapshot.debian.org has address 193.62.202.27
            # c.setopt(c.RESOLVE, ["snapshot.debian.org:80:185.17.185.185"])
            if f.tell() != 0:
                c.setopt(pycurl.RESUME_FROM, f.tell())
            c.setopt(c.WRITEDATA, f)
            c.perform()
            assert c.getinfo(c.RESPONSE_CODE) in [200, 206], c.getinfo(c.RESPONSE_CODE)
            c.close()
            return f.getvalue()
        except pycurl.error as e:
            code, message = e.args
            if code in [pycurl.E_PARTIAL_FILE, pycurl.E_COULDNT_CONNECT]:
                if retrynum == maxretries - 1:
                    break
                print("retrying...")
                time.sleep(2 ** retrynum)
                continue
            else:
                raise
    raise Exception("failed too often...")

Attachment: signature.asc
Description: signature


Reply to: