Bug#944968: popularity-contest: Program accesses internal dpkg database
On Tue, 19 Nov 2019 11:27:56 +0100 Bill Allombert <ballombe@debian.org>
wrote:
> On Tue, Nov 19, 2019 at 09:34:57AM +0100, Guillem Jover wrote:
> > Hi!
> >
> > On Mon, 2019-11-18 at 06:51:00 +0000, Niels Thykier wrote:
> > > On Sun, 17 Nov 2019 22:59:58 +0100 Bill Allombert wrote:
> > > > On Sun, Nov 17, 2019 at 10:44:02PM +0100, Guillem Jover wrote:
> > > > > Source: popularity-contest
> > > > > Source-Version: 1.69
> > > > > Severity: important
> > > > > User: debian-dpkg@lists.debian.org
> > > > > Usertags: dpkg-db-access-blocker
> >
> > > > > This package contains the «popularity-contest» program, which directly
> > > > > accesses the dpkg internal database, instead of using one of the public
> > > > > interfaces provided by dpkg.
> > > > >
> > > > > The program should stop reading the files list files, and switched to
> > > > > use something like:
> > > > >
> > > > > «dpkg-query \
> > > > > --showformat 'Package: ${Package}\nFiles:\n${db-fsys:Files}\n' \
> > > > > --show»
> > > > >
> > > > > to get them.
> >
> > > > the last time this comes up the performance of using dpkg-query was poor.
> > > > Was it improved ? What is the first release to support this syntax ?
> >
> > Just to clarify, the command above, does not need packages specified,
> > it will dump contents for the entire database.
>
> ...which is a problem because then it requires much more memory to proceed than the
> current popcon.
>
> So last time the solution was to do a separate dpkg-query for each packages,
> but this was much slower.
>
> Cheers,
> --
> Bill. <ballombe@debian.org>
>
> Imagine a large red swirl here.
>
>
Hi,
While it would take a bit of restructuring / refactoring, I think it
would be possible to use a single dpkg-query for everything and still be
able to process the data in a "streaming" fashion.
As an example, using the following:
dpkg-query --show \
--showformat='${status} ${package}\n${db-fsys:Files}\n\n'
Will give you a format of:
"""
install ok installed 0ad
/.
/usr
/usr/games
/usr/games/0ad
/usr/games/pyrogenesis
/usr/lib
[...]
/usr/share/pixmaps
/usr/share/pixmaps/0ad.png
/usr/share/man/man6/pyrogenesis.6.gz
install ok installed 0ad-data
/.
/usr
/usr/share
[...]
/usr/share/games/0ad/mods/public
/usr/share/games/0ad/mods/public/public.zip
install ok installed 0ad-data-common
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/0ad-data-common
/usr/share/doc/0ad-data-common/changelog.Debian.gz
/usr/share/doc/0ad-data-common/copyright
/usr/share/games
[...]
[...]
"""
This should be reasonably doable to parse in a streaming fashion without
having to keep all the file paths in memory. The performance for this
dpkg-query command is comparable with the previous timings I showed.
Thanks,
~Niels
Reply to: