[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[Popcon-developers] popcon written in python



On Thu, Jul 24, 2014 at 3:28 AM, Bill Allombert <ballombe at debian.org> wrote:

> > Hello popcon-devs,
> >
> > I have been working on a re-write of the popularity-contest application.
> >  This originally started off as being a learning exercise to both test my
> > reading of perl code and to have a new project for some python code.  I
> am
> > mostly happy with the current state of what I am calling pypopcon and
> > wanted to share my work:
> > https://github.com/drwahl/pypopcon
> >
> > There are some interesting (to me, at least) changes to this product as
> > compared to the currently used popularity-contest code.
>
> Hello David,
>
> For reference: I (re)wrote popularity-contest in basic perl because
> perl-base is
> Essential: yes, so that installing popcon does not affect the popularity
> of other packages.
>
> For that reason, it seems unlikely that your script report the same
> results as the
> standard popularity-contest perl script.
>
> > First, pypopcon explores all the files that a package provides and checks
> > for atime (instead of just a key binary a package provides).  I believe
> > this increases the accuracy as some package ship with multiple binaries
> and
> > the one that popularity-contest uses isn't always the most used binary
> from
> > the package.
>
> There are various files which atimes are changed without users action, for
> example
> by cron jobs, dpkg hooks. Including the files in the list means that the
> package atime became meaningless.  For example, all shared libraries,
> all python modules, etc. Thus we use a regex to limit the list to 'safe'
> files.
>
> Instead the popcon backend use the dependency graph to mark as voted all
> packages
> that are depended on by voted packages (transitively).
>
> > Secondly, pypopcon is showing a pretty decent performance increase (and
> > there is still room for more).  On my system, popularity-contest takes
> > about 15 seconds to run whereas pypopcon is taking about 8 seconds to
> run.
> >  One thing that is interesting about this metric is that pypopcon is
> > actually getting the atime/ctime of more files than the perl
> > popularity-contest script, so it's actually doing more work than
> > popularity-contest is, and it is doing it in less time.
>
> You need to split system time and user time in your benchmark.
> The system time is very much dependent on file system performance.
>
> Cheers,
> --
> Bill. <ballombe at debian.org>
>
> Imagine a large red swirl here.
>

Thanks for the response Bill.  I appreciate the feedback.  One thing I
forgot to mention about the pypocon that I wrote is that I also intended
for it to be used on non-debian (dpkg) systems or debian systems not using
apt.  It currently only supports yum/RPM in addition to deb/apt, but I'm
thinking of adding other package managers as well like gem and pip.

As for the performance, here is the "time" output on my system (24 cores,
1.2GHz, 16GB of RAM, SSD):

popularity-contest:
real    0m16.867s
user    0m6.024s
sys     0m10.495s

pypopcon:
real    0m8.297s
user    0m6.391s
sys     0m1.513s


As for the atime of files changing without user interaction, this is
something that I struggled with a bit. What it actually means to "use" a
package is really kinda a grey area to me, so I kind of went my own
direction with it (maybe incorrectly so).  It seems to me like files that a
user is using even indirectly might be considered to be "popular" as the
user could be using them to maintain their system in an automated fashion.
I probably need to think about that some more...

Thanks again for the feedback!
David W.
-- 
Unix, because every barista in Seattle has an MCSE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/popcon-developers/attachments/20140724/38e5e743/attachment.html>


Reply to: