Re: Estadistical proyect.
sorry for my late answer, I read this list only rarely.
* angel [2010-04-21 10:58 +0200]:
> In case Debian is interested in having someone work on a statistical
> project of any nature (the project would be supervised by professors
> of the Statistics and Operations Research department of the University
> of Granada and and Debian would not be charged with any costs
> whatsoever), please feel free to contact me.
Since your request has been misunderstood partly, I subsume it at first:
You want to do a small to medium-sized statistical project and you don't
want to program in a non-statistical language during this project.
Debian needs to provide you access to the data required to complete your
project and most importantly needs to tell you what information it is
Access to the data is easy: Stefano already mentioned Ultimate Debian
Database (UDD) as possible data source. It should contain everything
you would need in such a project. If you don't want to create a local
copy of the database using the provided dump you almost certainly should
get a guest-account on alioth.debian.org to access the database without
As you might know, Debian divides bugs by severities, the most important
ones are "critical", "serious" and "grave", these are release critical.
Additionally there are "important", "normal", "minor" and "wishlist".
Besides severities there are things like "fixed" and "found" and so on,
all this is documented on http://bugs.debian.org/ below the heading "Bug
tracking system documentation".
We want to release the next Debian release Squeeze soon and to do this
we need to get the number of release critical bugs in testing, currently
named Squeeze, to 0 (except some ignored bugs). To be able to improve
this process and thus improve Debian in general it would be nice if we
would have some statistical data about how bugs have been fixed in the
past. Understanding where we still have problems is the first step to
resolve them. :)
Some questions to be answered could be:
* How long does it take on average until a bug with a particular
severity gets fixed and how is the variance? Does this change if
Debian is frozen (frozen describes the time before the actual release
with a more restricted package migration to testing)?
* Is there a correlation between the number of installations according
to popcon and the time until a bug gets fixed?
* Is there a correlation between the size of a package or the packages
priority  and the average bug fixing time?
* How does the number of people maintaining a package relate to bug
fixing time and how does it relate to the number of unfixed bugs per
package? Rationale for this is that we encourage single maintainers
of important packages (with priority important and priority required)
to switch to maintaining these packages in a team but we lack data to
support our guess that team maintenance is really more efficient.
* How does the number of bugs reported per time unit change after a new
upstream release is packaged in comparison to releasing a new Debian
revision? Does which part of the upstream version number has been
incremented have any influence on it? I expect for example a 2.0
release to contain more bugs on average than a 2.2 release.
* We currently migrate packages that don't introduce new release
critical bugs to testing after being in unstable for 10 days (unless
the maintainer or the release team overwrite this or other packages
prevent it). Is this a good choice according to the time after an
upload when most release critical bugs are reported?
* Is there a relation between bug fixing and the time the last
maintainer upload happend? Similar question is if there is
a relation between bug fixing and the number of non maintainer
uploads (NMUs) in the last n years?
* How does the probability of a release critical bug being fixed in
a NMU instead of a regular maintainer upload raise over time?
* Is there a significant difference between bug fixing in officially
maintained packages and in packages maintained by the QA team
respectively orphaned packages?
* Are certain programming languages or packages in certain sections
more prone to FTBFS (fails to build from source) bugs, security
related bugs or bugs in general?
* A combination of probability of an release critical bugs being filed
over time in conjunction with the two former questions and the number
of installations according to popcon could possibly be used to help
people to decide if a package should be removed from Debian or
orphaned if the former maintainer lost interest. Currently I neither
have an idea how this could be done in a sane way nor if it can be
done in a sane way at all.
I think you got the idea. If you deal with data and these question some
time you should get a feeling which one might be relevant and
interesting and which one can be ignored. If you find something one
would not expect this should be the point where you start to dig in that
direction. In general Debian wants "some useful and/or interesting data
about how bugs are handled", everything I wrote above should be
considered as a suggestion that can be used as guideline until you get
the intuition to decide what looks promising. In no case I would expect
every mentioned point to be addressed by such a project, just choose
what you like and what seems to be valuable. I also would consider
"selected questioned answered, everything is as one would expect, here's
the data, nothing to see here, move along" as a very helpful result.
Due the distributed nature of Debian it's difficult to provide a Debian
Developer as exclusive contact person. If your university required
this, especially if she or he would need to partly evaluate your work,
this could be a problem, unless of course someone steps in and agrees to
be your mentor for this project.
If you are still interested in doing this project the obvious steps seem
1. Install a local copy of UDD or get access to it using an account on
alioth.debian.org. Drop me a mail if you have problems with
registering on  or if I should ask an alioth admin for approval.
2. Make yourself familiar with the database schema  to be able to
extract the required information. If necessary you would need to
gain basic SQL knowledge to get the data, but I guess a statistician
knows some SQL.
3. Talk to your professor.
4. Do the project. If reasonable you could provide intermediate
results to this list, but this is not necessary.
5. Send the results of your work to this list and to email@example.com.
The press team will ensure that it will be mentioned in our regular
Debian newsletter and possibly other places. It would be good,
though not required, if the thesis could be placed somewhere on our
website. Debian tries to ensure that the information on its website
is free, the people maintaining it know exactly if a free license is
required or just recommended for this. A minimal way to put it
under a free license is writing the following text in the mail when
you send your results (though this would not prevent others from
using parts of your paper without attribution, but choosing another
license with an attribution clause is easy):
| License and Copyright of the attached document:
| Copyright (c) 2010 Name <E-mail address>
| Permission to use, copy, modify, and/or distribute this software
| for any purpose with or without fee is hereby granted.
P.S.: Please CC me on answers.