[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#756343: Fix gethostbyname() sending data on random file descriptors in wheezy, already done in jessie



Hi,

On Mon, Jul 28, 2014 at 04:43:37PM -0700, Marcus Ewert wrote:
> Package: libc6
> Version: 2.13-38+deb7u1
> Severity: normal
> 
> Hello,
> 
> On test systems running stress workloads we were regularly encountering a
> bug
> in gethostbyname that is fixed in libc6 in jessie. For completeness I've
> included the entire repro/investigation process; however, we are fairly
> sure the
> bug is the same as debian bug #722075. I'm writing to inquire if this
> bugfix can
> be backported to wheezy (stable).
> 
> We encountered this bug on fractional core VMs running workloads that stress
> disk, cpu, and networking. As part of that testing we make many concurrent
> HTTP
> request in python, the relevant code being similar to:
> 
> > def GetURL(**kwargs):
> >   url = 'http://www.example.com/'
> >   request = urllib2.Request(url)
> >   return urllib2.urlopen(request, **kwargs).read()
> >
> > def HammerGetHostByID():
> >   while True:
> >     try:
> >       GetURL(timeout=1)
> >     except:
> >       pass
> >
> > for _ in xrange(10):
> >   thread = threading.Thread(target=HammerGetHostByID)
> >   thread.start()
> 
> Running a workload like this in 500 VMs running wheezy would yield O(8)
> failures
> over 24 hours with the following output:
> 
> *** glibc detected *** /usr/bin/python: double free or corruption (out)
> 
> Digging a little deeper with a debugger we found that whenever these were
> hit,
> the stack would contain _nss_dns_gethostbyname4_r and have garbage stack
> frames
> above that. The gethostbyname() call most likely comes from the above
> urlopen.
> 
> Given this observation, we suspected a connection to debian bug #722075, and
> attempted the following patch to libc6:
> 
> diff -rupN eglibc-2.13/resolv/res_send.c eglibc-2.13-mod/resolv/res_send.c
> --- eglibc-2.13/resolv/res_send.c 2010-03-26 14:08:35.000000000 -0700
> +++ eglibc-2.13-mod/resolv/res_send.c 2014-07-02 10:23:28.521088097 -0700
> @@ -1330,6 +1330,7 @@ send_dg(res_state statp,
>   retval = reopen (statp, terrno, ns);
>   if (retval <= 0)
>   return retval;
> + pfd[0].fd = EXT(statp).nssocks[ns];
>   }
>   }
>   goto wait;
> 
> With this single-line patch we no longer hit the 'double free or corruption'
> message even when running 100 VMs for over 5 days. I extracted the above
> code
> fix from https://lists.debian.org/debian-glibc/2014/06/msg00013.html, but
> modified the diff to fit on 2.13-38+deb7u1.
> 
> If a fix similar to this could be included in wheezy stable at some point it
> would be much appreciated.
> 

I have just committed the change in our stable branch [1]. We'll upload
the package a bit before the next Debian stable release, if the release
team agrees with the changes (which is likely in that case).

[1] http://anonscm.debian.org/viewvc/pkg-glibc?view=revision&revision=6227

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net


Reply to: