[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFC: Creation of a PostgreSQL database schema for wanna-build data



Adeodato Simó <dato@net.com.org.es> writes:
[debversion cmp]
> The version in C has a couple downsides. The most important is that you
> need to be superuser in order to create functions in C. This shouldn't
> be a problem, because either wbadm will be superuser in the wannab
> database or, alternatively, DSA can run version.sql for us as a one time
> operation. (Additionally, I think both the PL/Perl and the PL/PerlU
> versions need superuser as well, for the functions that use the
> "internal" language.)

I think using the C version is reasonable. Our sysadmins are around,
asking them to add a function that could lower the load on our boxes
significantly shouldn't be an issue. If needed, i can personally kick
zobel :-)

>> No problem.  One thing following from your patches and this thread is
>> that there is a lot of similarity between the dak projectb schema and
>> this one.  As a result, I think it makes sense to do things like dak
>> does where possible, to make things as compatible as possible.  Quite
>> a few of your suggestions take us in this direction anyway.
> As I said on IRC, I don't see a need to be schema/table compatible with
> projectb. (And projectb is presumably going away in the future.)

ACK. I don't think that compatibility to the current projectb is very
important, as the ftpteam seems to have decided to replace it.

> We need to figure out how to store extra access. In principle, a simple
> table "extra_access" with (login, arch) rows should suffice. However, I
> would like there's a way to say "all arches", and neither of the two
> ways I thought of doing that pleased me much.
>
> (Way #1: have a boolean in "extra_access"; then the "arch" column
> becomes meaningless for that row. Way #2: allow an "all" value for arch
> in extra_access; then, you can't reference architectures and have to
> reference package_architectures instead.)

I would like to join the information in extra_access (or however it is
named) with the buildd_admins table anyway, creating an "access"
view. In that process, translating "all" to an entry for each arch
shouldn't be too hard. It would allow to do actual access control
against a single view, instead of checking two separate tables.

> I chatted with Philipp Kern a bit about this. We agreed that having the
> Sources Architecture field available in order not to unnecessarily send
> packages to the buildds is a good thing. However, he mentioned that many
> times the Architecture field in Sources (the first paragraph in
> debian/control) is set to any, and then the architecture field of the
> rest of stanzas in debian/control is restricted. At the moment, the DB
> has no place to reflect this (because that information is not in
> Packages field, of course).
>
> And then we have the fact that P-a-s is not only source based, but also
> deals with binary packages.
>
> I think all this will need quite a rework, and I suggest (despite my
> initial enthusiasm) we let source_architectures be like you initially
> suggested, and come back to it some time after having Postgres deployed.

ACK. This would make the w-b logic considerably smarter than it is now,
and I would prefer to move over to Postgres first, then start making big
improvements.

>> I'd also like to store the build time and space statistics (currently
>> stored by sbuild in a GDBM database) stored here along with the logs.
>
> Ok, I added those as well, but without a NOT NULL restriction, otherwise
> it'll be impossible to import old logs. If somebody cares enough, once
> we have this DB up and the old logs imported we can run a daily job to
> slowly fill in that information, if it's easily retrievable from the log
> itself (which I think it is).

tail -n 1 on any build log will yield
| Build needed $time, $sizek disk space
This could (and should) be imported when the old build logs are
imported.

> Also, please check that build_time = interval, and used_space => integer
> are appropriate names and types. (In particular, I'm not sure if "space
> statistics" is just a number, or more.)

It's the size of the build dir, in KB. This doesn't count installed
dependencies, which makes for interesting problems when trying to
guess how much space you need to build packages (for example, OO.org's
build-deps add more than a GB to the space requirements). We might want
to change the computation of that at some point.

>>> 13. The _name suffix everywhere annoys me a bit, since I like having
>>>     tables that can be natural join'ed, or using(foo) join'ed. Maybe
>>>     this is just me, and the _name suffix is excelent SQL practice, but
>>>     I thought I'd mention to see what other think.
>> If it makes queries simpler, and more understandable, then I'm all for it.
>
> Okay, I went ahead and did this. I think it results in a sensibly more
> pleasant query experience. Compare:
>
>   => SELECT * FROM uploaders u (source, version, maintainer)
>          NATURAL JOIN sources
>          NATURAL JOIN suite_sources
>          WHERE suite = 'unstable' ORDER BY 3;
>
> to the old:
>
>   => SELECT * FROM uploaders u JOIN sources s
>          ON (s.name = u.source_name AND
>              s.version = u.source_version AND
>              s.maintainer = u.uploader)
>          NATURAL JOIN suite_sources
>          WHERE suite_name = 'unstable' ORDER BY 3;

I have to admit that I'm not a big fan of using natural joins instead of
the self-documenting "join ... on" construct. It requires people to keep
the table definitions always in mind to understand what is used to join
them.

Marc
-- 
BOFH #183:
filesystem not big enough for Jumbo Kernel Patch

Attachment: pgpAJkrTpLGz4.pgp
Description: PGP signature


Reply to: