[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Automated package-popularity survey (was Re: We need easier installation.)



On Fri, Oct 23, 1998 at 07:36:54PM -0400, Dirk Eddelbuettel wrote:

> Avery said:
> > Hey, here's an idea: why not compare the last-access times of executable
> > binaries in each package?  The ones which have been most recently
> > accessed are the ones which are most popular.
> 
> That's much more informative than mirror/ftp stats which are biased by
> `download but no install' transfers.
> 
> Your method might fall short on overrating cron, at, ... just because they
> are called automagically.

I'm worried that any heuristic we use will be prone to error -- but
hopefully not _extreme_ error.  Overrating cron and at will be okay, since
they're very popular anyway.  There will, unfortunately, be a similar
problem with any daemons that are started from /etc/init.d, and any programs
with actual cron jobs.

Also, my original idea for just looking at executable binaries was flawed:
it severely prejudices against -dev and document-only packages, for example.

> As for data gathering, you could also use GNU acct [ which I maintain.] See
> for my box here (a demonstration of it doing mostly mailpoppin'):
> 
> miles:~ [root] # sa | sort -r | awk '{print $4}' | head -20

Hmm, it seems I compiled my kernel without BSD accounting support in it. 
However, I think your method is based on the _frequency_ of calls to a
program -- that's probably not really fair.  After all, I call 'ls' much
more often than 'startx', but I still want startx around :)

Here's my latest attempt at a rating system.  It's severely prejudiced
against lib* packages, but that should be okay since real binary packages
depend on them anyway, and that's often the only reason they get installed.

Please don't complain about slowness, as it's not at all optimized for speed
yet.

Your installed packages will be listed in order of "most popular" to "least
popular." Do the results look reasonable to people?  Is it doing anything
blatantly wrong?  Is this whole concept better than nothing, or worse?

Don't forget to check the file ignored-pkg after running this script, to see
which packages were on your system but contained no "interesting" files. 
These ones get no rating at all...

Oh, the first two columns are the access time and the attribute-changed
time.  If the two are very close together, it probably means that dpkg has
recently upgraded the package, so we don't really know anything about its
popularity.

Have fun,

Avery

#!/bin/bash
# Ha ha! I can write ugly code!
rm -f ignored-pkg
for d in /var/lib/dpkg/info/*.list; do
    f=$(basename $d .list)
    find $(cat $d \
	   | egrep '/bin/|/sbin/|/usr/games|\.[ah]$' \
	   || echo $d >>ignored-pkg) \
	-follow -prune -type f -printf "%A@ %C@ $f %p\\n" 2>/dev/null \
      | sort | tail -1
done | sort -r | less


Reply to: