Bug#748922: python-apt: TagFile doesnt close file
On Fri, Jun 06, 2014 at 11:25:52AM +0200, Johannes Schauer wrote:
> Hi
>
> Quoting Michael Vogt (2014-06-06 11:15:29)
> > There is a small typo in the above script. gc.collect should be gc.collect().
>
> right. I noticed this too but it was too late and I already sent in my
> bugreport. See my other submission to it for my updated results.
>
> > I verified that the following works and does not leak fds:
> > """
> > class LeakTestCase(unittest.TestCase):
> > def test_leak(self):
> > # clenaup gc first
> > import gc
> > gc.collect()
> > # see what fds we have
> > fds = os.listdir("/proc/self/fd")
> > testfile = __file__
> > tagf = apt_pkg.TagFile(testfile)
> > tagf.step()
> > del tagf
> > import gc
> > gc.collect()
> > # ensure fd is closed
> > self.assertEqual(fds, os.listdir("/proc/self/fd"))
> > """
> >
> > Unfortunately just doing a "del tagf" is not enough, the gc call is
> > needed afterwards. The reason that the del is not enough is that there
> > is there is a cyclic reference from the tagf to tagf.section. The
> > garbage collector breaks it, but a simple del sees a refcount >
> > 0. This particular case could maybe fixed by copying the data from the
> > pkgTagFile to a pkgTagSection instead of letting it operator on the
> > Buffer of pkgTagFile. But that requires somework (plus additional
> > memory for the copied data).
>
> The problem is, as you also identified above, that as long as the Python object
> for apt_pkg.TagFile is around, the file stays open.
>
> I switched from apt_pkg to debian.deb822 because in my use case I want to read
> from file A, modify the data and want to write back to A again. For that I
> obviously should not have the original fd from reading A open when I write back
> to A. One possible workaround would be to copy all of A into a StringIO and
> then pass that to apt_pkg.TagFile. This would probably work but I think the
> expectation is that after doing:
>
> mypkgs = list(apt_pkg.TagFile("Packages"))
>
> or:
>
> mypkgs = []
> for pkg in apt_pkg.TagFile("Packages"):
> mypkgs.append(pkgs)
>
> that there are no files left open. Currently in both cases, the fd is still
> around. So after the last pkgTagSection is retrieved, the file should be
> closed.
>
The easiest way to make this work would be to make the reference from TagFile to
TagSection weak, but this creates a small performance issue, because we'd need to
re-create the TagSection for each section, unless it is referenced somewhere
outside. So I'm not sure that's a good idea (same for mvo's idea).
What we really should do is add
(1) a close() method, and
(2) context manager support
to TagFile. This way you can just explicitly close the
file when you're done with it.
As a workaround, open the file yourself and pass the open file to the
tag file. The best way would be:
with open("Packages") as fobj:
for pkg in apt_pkg.TagFile(fobj):
mypkgs.append(pkgs)
You need to keep the file object open as long as the tag file
exists, so just TagFile(open("Packages")) would fail.
--
Julian Andres Klode - Debian Developer, Ubuntu Member
See http://wiki.debian.org/JulianAndresKlode and http://jak-linux.org/.
Please do not top-post if possible.
Reply to: