Re: UDD and bugs assigned to two or more packages
On Sat, Sep 27, 2008 at 12:24:54PM +0200, Lucas Nussbaum wrote:
> On 27/09/08 at 01:49 +0200, Kurt Roeckx wrote:
> > On Fri, Sep 26, 2008 at 01:10:52PM +0200, Lucas Nussbaum wrote:
> > >
> > > (2) the way we deal with bugs affecting two (or more) packages.
> > > That's a tricky problem. On one hand, the correct way to solve that
> > > would be to have a separate "affected_packages" table, with (id,
> > > package) columns, and also an "affected_sources" table.
> > >
> > > On the other hand, that's makes all queries more complex, because you
> > > have to join this additional table (and it's not going to be a small
> > > table). Currently, there are only 78 unarchived bugs (out of 69982)
> > > which are affected to 2 or more packages (select count(*) from bugs
> > > where package like '%,%').
> > >
> > > One possibility would be to keep the package/source columns in bugs, but
> > > add tables with (id, package) and (id, source). That way, people not
> > > interested in this case can join the additional table, while others can
> > > just ignore it. It would only increase a bit the time needed to import
> > > the bugs (about half an hour currently, only I/O bound).
> > I was thinking about adding the bug two times in the bugs table.
> That's quite dirty, as it would cause the other information to be
> duplicated as well (e.g a "bugs per submitter" query). Also, it would
> require changing the primary key for the bugs table (it's the id,
Each solution has it's advantages and disadvantes. And it mostly comes
down on how far you want to normalize your data. And I'm actually in
favour of normalizing the data, and have no problems writing such complex
queries. But it's not always easy to get those right.
We currently already process some of the data before it gets stored
in the database to make the queries easier to write and mostly
faster to execute. And I'm not sure what the best way to go is.