[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#944968: popularity-contest: Program accesses internal dpkg database



On Tue, 19 Nov 2019 11:27:56 +0100 Bill Allombert <ballombe@debian.org>
wrote:
> On Tue, Nov 19, 2019 at 09:34:57AM +0100, Guillem Jover wrote:
> > Hi!
> > 
> > On Mon, 2019-11-18 at 06:51:00 +0000, Niels Thykier wrote:
> > > On Sun, 17 Nov 2019 22:59:58 +0100 Bill Allombert wrote:
> > > > On Sun, Nov 17, 2019 at 10:44:02PM +0100, Guillem Jover wrote:
> > > > > Source: popularity-contest
> > > > > Source-Version: 1.69
> > > > > Severity: important
> > > > > User: debian-dpkg@lists.debian.org
> > > > > Usertags: dpkg-db-access-blocker
> > 
> > > > > This package contains the «popularity-contest» program, which directly
> > > > > accesses the dpkg internal database, instead of using one of the public
> > > > > interfaces provided by dpkg.
> > > > > 
> > > > > The program should stop reading the files list files, and switched to
> > > > > use something like:
> > > > > 
> > > > >   «dpkg-query \
> > > > >     --showformat 'Package: ${Package}\nFiles:\n${db-fsys:Files}\n' \
> > > > >     --show»
> > > > > 
> > > > > to get them.
> > 
> > > > the last time this comes up the performance of using dpkg-query was poor. 
> > > > Was it improved ? What is the first release to support this syntax ?
> > 
> > Just to clarify, the command above, does not need packages specified,
> > it will dump contents for the entire database.
> 
> ...which is a problem because then it requires much more memory to proceed than the
> current popcon.
> 
> So last time the solution was to do a separate dpkg-query for each packages,
> but this was much slower.
> 
> Cheers,
> -- 
> Bill. <ballombe@debian.org>
> 
> Imagine a large red swirl here. 
> 
> 

Hi,

While it would take a bit of restructuring / refactoring, I think it
would be possible to use a single dpkg-query for everything and still be
able to process the data in a "streaming" fashion.

As an example, using the following:

  dpkg-query --show \
    --showformat='${status} ${package}\n${db-fsys:Files}\n\n'

Will give you a format of:

"""
install ok installed 0ad
 /.
 /usr
 /usr/games
 /usr/games/0ad
 /usr/games/pyrogenesis
 /usr/lib
 [...]
 /usr/share/pixmaps
 /usr/share/pixmaps/0ad.png
 /usr/share/man/man6/pyrogenesis.6.gz


install ok installed 0ad-data
 /.
 /usr
 /usr/share
 [...]
 /usr/share/games/0ad/mods/public
 /usr/share/games/0ad/mods/public/public.zip


install ok installed 0ad-data-common
 /.
 /usr
 /usr/share
 /usr/share/doc
 /usr/share/doc/0ad-data-common
 /usr/share/doc/0ad-data-common/changelog.Debian.gz
 /usr/share/doc/0ad-data-common/copyright
 /usr/share/games
 [...]

[...]
"""

This should be reasonably doable to parse in a streaming fashion without
having to keep all the file paths in memory.  The performance for this
dpkg-query command is comparable with the previous timings I showed.


Thanks,
~Niels


Reply to: