[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#748922: python-apt: TagFile doesnt close file



On Fri, Jun 06, 2014 at 11:25:52AM +0200, Johannes Schauer wrote:
> Hi
> 
> Quoting Michael Vogt (2014-06-06 11:15:29)
> > There is a small typo in the above script. gc.collect should be gc.collect().
> 
> right. I noticed this too but it was too late and I already sent in my
> bugreport. See my other submission to it for my updated results.
> 
> > I verified that the following works and does not leak fds:
> > """
> > class LeakTestCase(unittest.TestCase):
> >     def test_leak(self):
> >         # clenaup gc first
> >         import gc
> >         gc.collect()
> >         # see what fds we have
> >         fds = os.listdir("/proc/self/fd")
> >         testfile = __file__
> >         tagf = apt_pkg.TagFile(testfile)
> >         tagf.step()
> >         del tagf
> >         import gc
> >         gc.collect()
> >         # ensure fd is closed
> >         self.assertEqual(fds, os.listdir("/proc/self/fd"))
> > """
> > 
> > Unfortunately just doing a "del tagf" is not enough, the gc call is
> > needed afterwards. The reason that the del is not enough is that there
> > is there is a cyclic reference from the tagf to tagf.section. The
> > garbage collector breaks it, but a simple del sees a refcount >
> > 0. This particular case could maybe fixed by copying the data from the
> > pkgTagFile to a pkgTagSection instead of letting it operator on the
> > Buffer of pkgTagFile. But that requires somework (plus additional
> > memory for the copied data).
> 
> The problem is, as you also identified above, that as long as the Python object
> for apt_pkg.TagFile is around, the file stays open.
> 
> I switched from apt_pkg to debian.deb822 because in my use case I want to read
> from file A, modify the data and want to write back to A again. For that I
> obviously should not have the original fd from reading A open when I write back
> to A. One possible workaround would be to copy all of A into a StringIO and
> then pass that to apt_pkg.TagFile. This would probably work but I think the
> expectation is that after doing:
> 
> 	mypkgs = list(apt_pkg.TagFile("Packages"))
> 
> or:
> 
> 	mypkgs = []
> 	for pkg in apt_pkg.TagFile("Packages"):
> 		mypkgs.append(pkgs)
> 
> that there are no files left open. Currently in both cases, the fd is still
> around. So after the last pkgTagSection is retrieved, the file should be
> closed.
> 

The easiest way to make this work would be to make the reference from TagFile to 
TagSection weak, but this creates a small performance issue, because we'd need to 
re-create the TagSection for each section, unless it is referenced somewhere 
outside. So I'm not sure that's a good idea (same for mvo's idea).

What we really should do is add
	(1) a close() method, and
	(2) context manager support
to TagFile. This way you can just explicitly close the
file when you're done with it.

As a workaround, open the file yourself and pass the open file to the
tag file. The best way would be:

with open("Packages") as fobj:
    for pkg in apt_pkg.TagFile(fobj):
        mypkgs.append(pkgs)

You need to keep the file object open as long as the tag file
exists, so just TagFile(open("Packages")) would fail.


-- 
Julian Andres Klode  - Debian Developer, Ubuntu Member

See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.

Please do not top-post if possible.


Reply to: