[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#745487: apt python library leak memory and file descriptors



Hi everyone!

On Wed, Apr 23, 2014 at 10:09:15AM +0200, Michael Vogt wrote:
> On Tue, Apr 22, 2014 at 11:23:24AM +0200, Petter Reinholdtsen wrote:
> > Package: python-apt
> > Version: 0.8.8.2
> > Severity: important
> 
> Thanks for your bugreport.
>  
> > Since I wrote the isenkram package, it have been plagued with a nasty
> > resourse leak.  It leak memory and file descriptors every time some
> > hardware is inserted.  The amount of leaked memory varies, but seem to
> > be 30-40 MiB every time.  The amount of file descriptors depend on the
> > number of APT sources listed in /etc/apt/.
> > <URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730704 >
> > is an example on what happen when the process run out of file
> > descriptors, and
> > <URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719837 > is the
> > initial report about leaking memory.
> 
> This is a good test scenario, afaik python-apt is not being used in
> daemons so far.
> 
> > I've been able to track down the problem to the python-apt library, and
> > the following test program demonstrate the leak:
> > 
> >   #!/usr/bin/python
> >   import apt
> >   while True:
> >     cache = apt.Cache()
> >     cache.open(None)
> > 
> > When left alone running for a while, it crashes like this:
> > 
> >   % ./apt-leak.py 
> >   Traceback (most recent call last):
> >     File "./apt-leak.py", line 6, in <module>
> >       cache.open(None)
> >     File "/usr/lib/python2.7/dist-packages/apt/cache.py", line 147, in open
> >       self._records = apt_pkg.PackageRecords(self._cache)
> 
> Thanks for this demo, I can reproduce the issue with it. However, if I
> change it a little bit and force the python garbage collector there
> seems to be no leak (or a small memory leak but no fd leak).
> """
> #!/usr/bin/python
> 
> import gc
> import os
> import timeit
> 
> import apt
> 
> def show_resource_stats():
>     mem = open("/proc/self/statm").read().strip().split()[0:2]
>     open_fds = len(os.listdir("/proc/self/fd/"))
>     print("mem: %s" % mem)
>     print("open fds: %s" % open_fds)
> 
> 
> 
> if __name__ == "__main__":
>     i=0
>     while True:
>         cache = apt.Cache()
>         cache.open()
>         i += 1
>         if i%10 == 0:
>             print("\nAfter %s runs" % i)
>             show_resource_stats()
>             print(timeit.timeit(gc.collect, number=1))
>             print("After collect:")
>             show_resource_stats()
> """
> 
> I see numbers like:
> """
> After 20 runs
> mem: ['181774', '165966']
> open fds: 653
> 0.236461877823
> After collect:
> mem: ['68298', '57910']
> open fds: 63
> 
> After 30 runs
> mem: ['181774', '165980']
> open fds: 653
> 0.229271888733
> After collect:
> mem: ['68938', '58564']
> open fds: 63
> """
> 
> Mem is "size" (VmSize) first and "resident" (VmRSS) second.
> 
> I suspect that the behavior without running the GC is caused by the
> way we handle the pkgCache and pkgDepCache internally. Both are
> managed via a CacheFile and it seems that a del of the
> pkgCache/pkgDepCache leave the CacheFile (that really has the
> resources open) with zero references, but the resources are only
> really freed once the GC runs. This needs some investigation in the
> python/cache.cc code to see if we can explicitly trigger the free.

The behaviour with the open fds is caused by the PackageRecords
not being deleted correctly. There still is some memory leak
elsewhere, though.

Not sure why, though, we actively 'del' the _records
attribute. Adding a close() function to it is slightly
more complicated, but possible as well.

This would then fix the extreme issues and not leak
FDs anymore.

-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Be friendly, do not top-post, and follow RFC 1855 "Netiquette".
    - If you don't I might ignore you.


Reply to: