Bug#565219: qa.debian.org: bug history graphs are incorrect

On Thu, Jan 14, 2010 at 08:37:09AM +0100, Raphael Hertzog wrote:
> On Wed, 13 Jan 2010, Francesco Poli (t1000) wrote:
> > Package: qa.debian.org
> > Severity: normal
> > 
> > Hi!
> > 
> > The bug count (and consequently the plot shape) is often incorrect
> > in the bug history graphs.
> > 
> > Consider for instance:
> > http://packages.qa.debian.org/d/dpkg-ruby.html
> > 
> > The package currenty has 0 (zero) outstanding bugs and 8 unarchived
> > resolved bugs.
> > Nonetheless, the graph
> > http://people.debian.org/~glandium/bts/d/dpkg-ruby.png
> > claims a total of 7 bugs and the plot is somehow missing in the last
> > days.
> > 
> > By looking at
> > http://people.debian.org/~glandium/bts/d/
> > I see that the graph was last updated on 13-Jan-2010 12:07,
> > but at that time, all bugs were already resolved.
> > 
> > 
> > I don't know whether this problem is related to (or even the same as)
> > bugs #546676, #548009, and #526237.
> > 
> > Anyway, please fix the generation of the graphs: they are nice, but
> > often misleading...
> The graphs are under the control of Mike Hommey and nobody else... hence
> I'm ccing him. We can't do anything about it currently.
> BTW, glandium, would you like to maintain your graphs somewhere on
> qa.debian.org as part of the QA team?

I've discussed this with both zack and don @debconf and the outcome is
that it would be better if it were integrated within the BTS.

Now, as for the current graphs, I'm sorry to say that they just suck for
several reasons:
- The data they use is wrong. Take a look at the bug counts on ddpo and
  the bug counts on the pts: they differ (look at iceweasel and iceape,
  for instance). The pts is right, ddpo is wrong. Unfortunately, the
  graphs use the ddpo data (which is the only one available to download
  as a huge file with all information, afaik)
- They occasionally break. Sometimes, bugs are not affected to the
  correct package at bug report time, and that breaks the ddpo data
  file. This is why we have files such as
- The RRD files were created with wrong parameters, which now induce a
  lot of problems, such as if you change the priority of a bug, it will
  end up being counted twice for at least a day.

I've been considering running a script on merkel to get the proper data
instead of the ddpo one, but that would only solve part of the problem,
and only for newer data. It would be nice to recreate the proper history
from the BTS data, but that would be quite some work.
Something else that would be nice to add to the graphs is the source
uploads, and, possibly, a broader time span (but here again, the current
rrd configuration doesn't permit).

I currently have some spare time, so I might get to work on some of
these, but please feel free to give a hand.



