Re: New statistics about upload activity of the team

To: debian-science@lists.debian.org, Debian Med Project List <debian-med@lists.debian.org>
Subject: Re: New statistics about upload activity of the team
From: Scott Howard <showard314@gmail.com>
Date: Mon, 24 Jan 2011 12:45:44 -0500
Message-id: <[🔎] AANLkTi=g60PoMw4WfNw4k1ya31fCXzzj+bCqyqfcS_Wy@mail.gmail.com>
In-reply-to: <[🔎] 20110124164915.GA25671@an3as.eu>
References: <[🔎] 20110124113023.GA9035@an3as.eu> <[🔎] AANLkTik41QZX0pUwn9AAptaZc4gzoJT-65opAfRdB6-u@mail.gmail.com> <[🔎] 20110124164915.GA25671@an3as.eu>

On Mon, Jan 24, 2011 at 11:49 AM, Andreas Tille <andreas@an3as.eu> wrote:
> On Mon, Jan 24, 2011 at 11:24:26AM -0500, Scott Howard wrote:
>> It would be interesting to see a single number representing the
>> "health" of a team over time. I propose using something like the
>> Hirsch index [1] (an index which invokes widespread unease when
>> applied to evaluating faculty candidates and promotion while at the
>> same time invoking widespread use.)
>
> Sounds interesting: Anybody willing to code the SQL query for
> implementing the h-index?  I admit I have no real clue how to map
> the "citations" mentioned on the WikiPedia reference with package
> uploads.  Uploading a new package version is quite different from
> creating a new package - measuring both the same is unfair.  My
> "highest number of upload per year" was just reached in a QA effort
> which did not costed a lot of time - way less than if I would have
> created a medium complicated package from scratch.

I wouldn't mind taking a stab at it. It might take some time while
doing my day-job. The simplest implementation is to have
upload_history.py output more names (like you did) so we include
almost everyone. We can add a new script to blindly apply the
wikipedia algorithm to the output text file and plot that. I just
finished the NM process and am awaiting my DD account to be activated,
so this will give me a chance to learn about the UDD a bit more.

> This is a good idea but as I said I need a better algorithm than on
> the WikiPedia page to acomplish this.

While not perfect, I think the algorithm might reduce "noise" from the
difference in effort between QA work and new packaging. Assuming the
ratio of QA work to new packaging is roughly similar over the years,
we would still see whether the team is growing in both size and
activity. If one person does a lot of QA work, it won't increase the
team's h-index (that's almost the academic analog to trying to publish
more papers to raise you h-index.) But a team, as a whole, doing more
QA work would increase the h-index - which I think is a good thing to
measure.

If we wanted finer detail, we can separate our NEW packages and
uploads into two separate datasets, but for now I think the quick and
dirty analysis would still give us some useful data.

Cheers,
Scott

Reply to:

Follow-Ups:
- Re: New statistics about upload activity of the team
  - From: Andreas Tille <andreas@an3as.eu>

References:
- New statistics about upload activity of the team
  - From: Andreas Tille <andreas@an3as.eu>
- Re: New statistics about upload activity of the team
  - From: Scott Howard <showard314@gmail.com>
- Re: New statistics about upload activity of the team
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Re: New statistics about upload activity of the team
Next by Date: Re: New statistics about upload activity of the team
Previous by thread: Re: New statistics about upload activity of the team
Next by thread: Re: New statistics about upload activity of the team
Index(es):
- Date
- Thread