[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: New statistics about upload activity of the team



On Mon, Jan 24, 2011 at 11:24:26AM -0500, Scott Howard wrote:
> Asking scientists for hints about analyzing data . . . sounds scary...

:-)
 
> It would be interesting to see a single number representing the
> "health" of a team over time. I propose using something like the
> Hirsch index [1] (an index which invokes widespread unease when
> applied to evaluating faculty candidates and promotion while at the
> same time invoking widespread use.)
> 
> An h-index moving average (over a year, perhaps) shows if the team is
> growing both in size and in sharing of workload. If a single person is
> doing all the work, then the team isn't behaving like one, and would
> yield a low h-index. Similarly, to a large team made up of one-time
> uploaders isn't healthy for long term stability.

Sounds interesting: Anybody willing to code the SQL query for
implementing the h-index?  I admit I have no real clue how to map
the "citations" mentioned on the WikiPedia reference with package
uploads.  Uploading a new package version is quite different from
creating a new package - measuring both the same is unfair.  My
"highest number of upload per year" was just reached in a QA effort
which did not costed a lot of time - way less than if I would have
created a medium complicated package from scratch.

> I assume the data is only for the top ten uploaders since 2001, and
> there there are others that are new and haven't made enough uploads to
> make the top 10 over the past decade.

Yes.  I missed to mention this.  It is the same idea as it is behind
the mailing list statistics.

> However, for the case of this
> example, I'll pretend like the 10 uploaders listed at [2] make up the
> entire group of uploaders to debian-med.

Don't undersetimate the Debian Med team! :-)

   http://blends.debian.net/liststats/uploaders_debian-med_top20.png
 [ http://blends.debian.net/liststats/uploaders_debian-med_top20.txt ]

(but in the end there are probably NMUs - at least Matthias K. and
Moritz M. will not count themselves as part of the team).  Also Dirk E.
was basically doing some NMUs for R packages.
 
> med:
>  2001 2
>  2002 2
>  2003 2
>  2004 2
>  2005 2
>  2006 4
>  2007 4
>  2008 8
>  2009 6
>  2010 6
> 
> 
> I do not believe the above numbers are correct, because it is
> excluding uploaders who may have significantly contributed recently
> (e.g. ~15 uploads in each of 2009 and 2010), but did not make 30
> uploads over the decade to be represented in the data set. For
> example, if two such people existed, 2009 and 2010 would have
> h-indexes of 10 - clearly showing growth and improved team health over
> the past decade. The above data is an example and would have to be
> compiled using every uploader's data to give the correct number. It
> will probably yield higher numbers in 2009 and 2010 as more people
> contributed.

Yes.  Those people in fact exist as the top20 graph shows.
Unfortunately the graph just becomes quite big if you take more than 10
people into account.

> That could be a useful metric for other Debian teams to identify if
> the team is behaving more or less team-like over time.

This is a good idea but as I said I need a better algorithm than on
the WikiPedia page to acomplish this.

For completeness I also calculated top20 for Debian Science (with an URL
you can guess.  I also have calculated the graph for some other teams
(as you might have seen once you found the *.txt file).

Thanks for your input

      Andreas.
 
> [1] http://en.wikipedia.org/wiki/H-index
> [2] http://blends.debian.net/liststats/uploaders_debian-med.txt

-- 
http://fam-tille.de


Reply to: