[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New statistics about upload activity of the team



On Mon, Jan 24, 2011 at 12:45:44PM -0500, Scott Howard wrote:
> 
> I wouldn't mind taking a stab at it. It might take some time while
> doing my day-job. The simplest implementation is to have
> upload_history.py output more names (like you did) so we include
> almost everyone.

Well, it's no big deal to put exactly everyone in a text file (with a
different name (say h-indey_<pkggroup>.txt).  When thinking about it I
even have found a NMU flag in the upload_history table (which is not
completely relyable because I found several <upstreamversion>-1.1
uploads which were flagged nmu=f but I was able to remove several NMUers
from the stats (and once I found and fixed the reason for the wrongly
flagged NMUs we might get rid of all of them).

> We can add a new script to blindly apply the
> wikipedia algorithm to the output text file and plot that. I just
> finished the NM process and am awaiting my DD account to be activated,
> so this will give me a chance to learn about the UDD a bit more.

If you want you can learn about UDD immediately if you ask me for a
login at blends.debian.net where I'm maintaining a copy of UDD.  For the
investigation above you definitely need to go on this host because I
did a nasty manually hacked table to get unique names for developers
(the original UDD contains a lot of different spellings for one DD).
Just tell me in private mail if you are interested (and send me the
login name you prefer).
BTW, the source for the functions which are extracting the data is
at the same website.  It is
  http://blends.debian.net/liststats/create_bad_names.sql
Seek for
  CREATE OR REPLACE FUNCTION
at the end of the code below the manual hack ...
 
> > This is a good idea but as I said I need a better algorithm than on
> > the WikiPedia page to acomplish this.
> 
> While not perfect, I think the algorithm might reduce "noise" from the
> difference in effort between QA work and new packaging. Assuming the
> ratio of QA work to new packaging is roughly similar over the years,
> we would still see whether the team is growing in both size and
> activity.

Good point.

> If one person does a lot of QA work, it won't increase the
> team's h-index (that's almost the academic analog to trying to publish
> more papers to raise you h-index.) But a team, as a whole, doing more
> QA work would increase the h-index - which I think is a good thing to
> measure.

True.

> If we wanted finer detail, we can separate our NEW packages and
> uploads into two separate datasets, but for now I think the quick and
> dirty analysis would still give us some useful data.

I think what you suggest is neighter quick nor dirty - just a bit less
picky than it could possibly be.

Thanks for your suggestions

     Andreas. 

-- 
http://fam-tille.de


Reply to: