Re: some UDD schema changes
On Sun, Jul 19, 2009 at 11:45 AM, Lucas
> On 19/07/09 at 09:00 -0400, Jonathan Yu wrote:
>> On Thu, Jul 16, 2009 at 4:28 PM, Lucas Nussbaum<email@example.com> wrote:
>> > Hi,
>> > I've squashed a few items off my TODO list about UDD. I did the
>> > following changes:
>> > - 'release' columns are now ENUMs everywhere. This allows to compare two
>> > releases ('etch' < 'squeeze') or to order by release.
>> Oh, very nice, I didn't know you could compare ENUM that way.
>> > - all_packages_distrelcomparch (which was a hack) was replaced by two
>> > different tables (for Debian and Ubuntu) and a
>> > all_packages_distrelcomparch view.
>> > - lintian's tag_type column is now an ENUM as well. The order I chose
>> > is:
>> > ENUM('experimental', 'overriden', 'pedantic', 'information', 'warning',
>> > 'error');
>> > However, that's suboptimal, since experimental, overriden, and the
>> > severities are orthogonal. But that's what is exported by lintian.log.
>> > Is lintian.log really the best way to fetch information from
>> > lintian.debian.org?
>> > - Some indices proposed by Enrico were added.
>> > It is worth noting that none of those changes require changes in your
>> > queries (unless you relied on sorting releases or lintian tags
>> > alphabetically, which doesn't sound like something you want to do).
>> > However, turning release into an ENUM required changes in related tables
>> > (packages and packages_summary), and the only way to sanely apply them
>> > was to drop the table and create them again. Which I did, so the tables
>> > are filling up again slowly, and will be completely filled in a few
>> > hours.
>> I wasn't sure if the schema was frozen or if there are too many things
>> depending on the UDD to change it, but something that might be worth
>> considering is putting the ubuntu and debian stuff in a different
>> schema. That is, the debian tables would be in the default schema, and
>> ubuntu would have its own schema, so that you do queries like:
>> SELECT * FROM ubuntu.bugs ...
>> This is basically replacing the ubuntu_ prefix with an actual schema separation.
>> And under PostgreSQL you can work with multiple schemas from a single
>> user account/database, which is pretty sweet.
> I don't really see a lot of advantages in doing this. OK, for
> packages/sources, it would make sense. But the ubuntu bugs have a
> different structure, so it might be misleading to provide ubuntu.bugs.
Hm, that's a good point. I was just thinking it might be nice to help
separate the two projects. I'm not sure if people can currently just
get a data dump of Debian related info, or just Ubuntu stuff. I'm not
entirely sure what the scope of this project and its goals were.
I don't think it's unreasonable that the two schemas will have
slightly different table structures. What's to prevent people from
assuming that 'bugs' has the same structure as 'ubuntu_bugs' right ow,
aside from the schema diagram itself? And as someone working with
Debian stuff exclusively, do I really care about data in Ubuntu? Or
from the other side, if someone is working on Ubuntu, they might not
really care about Debian so much.
With schema support, you can simply adjust your search path... so you
could check Debian before Ubuntu (the default) when doing a SELECT on
the 'bugs' table. Or, you could check Ubuntu before Debian (by
altering your schema search path), so that SELECT on 'bugs' gives you
information you care about -- Ubuntu.
Though, you're right, none of this is really a compelling reason to
change the the database structure.
One interesting feature, though I don't know if you'd have a use for
it, is to separate privileges based on schema, so that programs
getting information from Ubuntu couldn't manipulate data in the
Debian, and vice versa. This gives you a bit of safety, though I guess
it's not much.
> | Lucas Nussbaum
> | firstname.lastname@example.org http://www.lucas-nussbaum.net/ |
> | jabber: email@example.com GPG: 1024D/023B3F4F |