Re: UDD and bugs assigned to two or more packages

To: Kurt Roeckx <kurt@roeckx.be>
Cc: "Frank S\. Thomas" <fst@debian.org>, debian-qa@lists.debian.org
Subject: Re: UDD and bugs assigned to two or more packages
From: Lucas Nussbaum <lucas@lucas-nussbaum.net>
Date: Sat, 27 Sep 2008 18:50:55 +0200
Message-id: <[🔎] 20080927165055.GA13757@xanadu.blop.info>
In-reply-to: <[🔎] 20080927134120.GA25734@roeckx.be>
References: <[🔎] 200809261252.41107.fst@debian.org> <[🔎] 20080926111052.GA509@xanadu.blop.info> <[🔎] 20080926234900.GA5018@roeckx.be> <[🔎] 20080927102454.GA25717@xanadu.blop.info> <[🔎] 20080927134120.GA25734@roeckx.be>

On 27/09/08 at 15:41 +0200, Kurt Roeckx wrote:
> On Sat, Sep 27, 2008 at 12:24:54PM +0200, Lucas Nussbaum wrote:
> > On 27/09/08 at 01:49 +0200, Kurt Roeckx wrote:
> > > On Fri, Sep 26, 2008 at 01:10:52PM +0200, Lucas Nussbaum wrote:
> > > > 
> > > > (2) the way we deal with bugs affecting two (or more) packages.
> > > > That's a tricky problem. On one hand, the correct way to solve that
> > > > would be to have a separate "affected_packages" table, with (id,
> > > > package) columns, and also an "affected_sources" table.
> > > > 
> > > > On the other hand, that's makes all queries more complex, because you
> > > > have to join this additional table (and it's not going to be a small
> > > > table). Currently, there are only 78 unarchived bugs (out of 69982)
> > > > which are affected to 2 or more packages (select count(*) from bugs
> > > > where package like '%,%').
> > > > 
> > > > One possibility would be to keep the package/source columns in bugs, but
> > > > add tables with (id, package) and (id, source). That way, people not
> > > > interested in this case can join the additional table, while others can
> > > > just ignore it. It would only increase a bit the time needed to import
> > > > the bugs (about half an hour currently, only I/O bound).
> > > 
> > > I was thinking about adding the bug two times in the bugs table.
> > 
> > That's quite dirty, as it would cause the other information to be
> > duplicated as well (e.g a "bugs per submitter" query). Also, it would
> > require changing the primary key for the bugs table (it's the id,
> > currently).
> 
> Each solution has it's advantages and disadvantes.  And it mostly comes
> down on how far you want to normalize your data.  And I'm actually in
> favour of normalizing the data, and have no problems writing such complex
> queries.  But it's not always easy to get those right.
> 
> We currently already process some of the data before it gets stored
> in the database to make the queries easier to write and mostly
> faster to execute.  And I'm not sure what the best way to go is.

I've added bugs_packages and archived_bugs_packages tables, but left the
package and source columns in bugs. So you can use the tables if you
want the "enhanced" precision, or the bugs fields if you don't care ;)

I've haven't updated the views to use them, since there's no data in the
tables atm.
-- 
| Lucas Nussbaum
| lucas@lucas-nussbaum.net   http://www.lucas-nussbaum.net/ |
| jabber: lucas@nussbaum.fr             GPG: 1024D/023B3F4F |

Reply to:

References:
- UDD and bugs assigned to two or more packages
  - From: "Frank S. Thomas" <fst@debian.org>
- Re: UDD and bugs assigned to two or more packages
  - From: Lucas Nussbaum <lucas@lucas-nussbaum.net>
- Re: UDD and bugs assigned to two or more packages
  - From: Kurt Roeckx <kurt@roeckx.be>
- Re: UDD and bugs assigned to two or more packages
  - From: Lucas Nussbaum <lucas@lucas-nussbaum.net>
- Re: UDD and bugs assigned to two or more packages
  - From: Kurt Roeckx <kurt@roeckx.be>

Prev by Date: Re: UDD and bugs assigned to two or more packages
Next by Date: Re: UDD and bugs assigned to two or more packages
Previous by thread: Re: UDD and bugs assigned to two or more packages
Next by thread: Re: UDD and bugs assigned to two or more packages
Index(es):
- Date
- Thread