On Sat, Jan 24, 2009 at 05:58:46PM +0100, Adeodato Simó wrote: > Hello, Roger, thanks for pushing this forward! > > > We would be grateful for any comments you might have, particularly > > if you have used the existing wanna-build and can see any defects, > > omissions, or any suggestions for anything we missed or could > > implement in a more optimal fashion. > > Okay, I have a number of comments, suggestions, wishes, and patches. > They're quite a few, so please take your time. Hopefully more people can > join, letting us know what they think of them, and verifying I'm not > talking shit. Most of the suggestions are good and well reasoned, though I'm not sure I agree completely with all of them. However, I don't make any claims to being a database expert, so I'm always willing to change to doing things another way if I've not fully considered the best way to do something. I applied all three of the patches you sent. The current state of the schema is at this head: http://git.debian.org/?p=users/rleigh/sbuild.git;a=shortlog;h=refs/heads/wb-backend with http://git.debian.org/?p=users/rleigh/sbuild.git;a=tree;f=db;h=d16c7e18d8bde6349fbd2323f256767dfc92794a;hb=692678942060152824a727b20b9aa69e14444191 being the current tree. As I mentioned on IRC, this tree contains three implementations of the debversion type for Postgres, implemented in PL/Perl, PL/PerlU and C. Once we have decided on the appropriate speed/security/code duplication tradeoff, we can pick one of these to go with. > Also, I'm happy to provide patches for all of these issues once we have > reached an agreement on the list, so please don't take all this as a > "fix it fix it fix it" mail, and feel free to just comment on these > issues. OTOH, if you prefer to prepare the patches yourself, I'm very > much okay with that too, though I'd appreciate if you posted them here > first (with git-send-email or similar), so there's a chance for review > before committing. No problem. One thing following from your patches and this thread is that there is a lot of similarity between the dak projectb schema and this one. As a result, I think it makes sense to do things like dak does where possible, to make things as compatible as possible. Quite a few of your suggestions take us in this direction anyway. I've attached a copy of some changes I make to do this, but this is just for a couple of tables so far. Comments? > So, in no particular order, let's go over them: > > 1. Permission handling. I see nothing about this in the design, but I > thought we'd be needing something in the DB? (In particular, I don't > have a clear picture how updates to the DB are going to be handled, > eg. vorlon@raff giving back a package under his UID.) > > Some thoughts for this: I would like for the builder table to have > "arch" and "admin" columns, so that buildd ownership can become the > primary source of information about who can do what. And then we need > a place to store permissions for users that don't admin buildds, or > extra permissions for certain admins. > > "admin" in builder should be a foreign key to a developers or similar > table, which should have at least an "address" NOT NULL column (since > I don't want to duplicate the information of who should be behind > each of the <arch>@buildd.d.o aliases). I hadn't considered this. Your suggestion for the builder table sounds fine. Would it be appropriate for non-admin developers to have the ability to do some tasks (such as sheduling a binNMU on all buildds)? > 2. [trivial] I'd like for the builder table to have an additional > "address" NOT NULL column, as to not duplicate the list of builders > in the DB and the aliases file. Sure. Where is the aliases file currently located? Is this just a standard /etc/aliases file? Does this want to be a one-to-many mapping to cope with multiple admins for a system? > 3. I'm doubtful about the "all" and "any" rows in the architectures > table. architectures is being referenced from tables with Packages > or Sources information, but also from stuff like build_jobs (and, as > per above, possibly builder too). And "all" or "any" don't really > make sense for this latter case. > Maybe we should create a view "pkg_architectures" that grabs > architectures and unions "any" and "all" on top of it, and have > sources and binaries reference the view? (Can a foreign key point to > a view?) I'm not sure, I'll have to check. If it's not possible, then we could have two separate tables for the two separate purposes. Or alternatively, add an extra column to architectures to determine where it may be used (and add a constraint on that). > A case could be made that "all" really belong in architectures, since > there could be an autobuilder for arch:all packages, but we can > easily change that later on. (And I don't think an arch:all > autobuilder would be handled that way anyway.) Agreed, you would probably just configure one architecture to build arch-all packages in addition to arch-any. I'll check up on the use of views (do you have an example of how to union data in a view?). My current thought is that the primary key would be a bit screwy, but maybe it would be OK. > 4. Would it be useful to have the priority sortable from within the > database? Then it doesn't have to be done in software. (This is just > an idea, and I'm not completely sure it's a good one. Note: if we do > this, ensure auto-insertion works well with it.) We can easily add a numerical priority and sort on that. I guess we could just default the priority to allow easy insertion at a low priority. > 5. [nitpick] Could we do without listing all package sections? They'll > be auto-inserted anyway. Yes, I already added the auto insertion, which adds package sections and priorities if missing: ============================================================================ CREATE OR REPLACE FUNCTION package_checkrel() RETURNS trigger AS $package_checkrel$ BEGIN PERFORM name FROM package_sections WHERE (name = NEW.pkg_section_name); IF FOUND = 'f' THEN INSERT INTO package_sections (name) VALUES (NEW.pkg_section_name); END IF; PERFORM name FROM package_priorities WHERE (name = NEW.pkg_priority_name); IF FOUND = 'f' THEN INSERT INTO package_priorities (name) VALUES (NEW.pkg_priority_name); END IF; RETURN NEW; END; $package_checkrel$ LANGUAGE plpgsql; COMMENT ON FUNCTION package_checkrel () IS 'Check foreign key references (package sections and priorities) exist'; CREATE TRIGGER checkrel BEFORE INSERT OR UPDATE ON sources FOR EACH ROW EXECUTE PROCEDURE package_checkrel(); COMMENT ON TRIGGER checkrel ON sources IS 'Check foreign key references (package sections and priorities) exist'; CREATE TRIGGER checkrel BEFORE INSERT OR UPDATE ON binaries FOR EACH ROW EXECUTE PROCEDURE package_checkrel(); COMMENT ON TRIGGER checkrel ON binaries IS 'Check foreign key references (package sections and priorities) exist'; ============================================================================ I'll remove the redundant list. > 6. Maintainer and Uploaders handling. I would like for "uploaders" to be > a proper one-to-many relationship, because the new web frontend will > be querying by uploader. Agreed. This is already how dak does it, and it's the best way to do it. We should just copy what dak does here. > Also, I don't think we need per-version maintainer/uploaders > information. Instead, the tool that populates the DB should just use > the information from the most recent version across suites. This might make insertion of data quite slow. The current schema would allow one to write everything from Pacages/Sources into binaries/sources and then update the dist_binaries/dist_sources to update the current packages in a given component/section. On the other hand, if we were to write the entire Packages/Sources into a temporary table, it could be fairly simple to do the version checks. > (Oh, and we should just keep the maintainter/uploaders *addresses*.) What is the reasoning for this? > 7. [distant future, no action needed now, but see the last paragraph in > this item] I think we should consider killing the P-a-s file as > input, and put that information in the DB. (P-a-s or similar would > still be created as output for third parties.) > > Then, source_architectures would be a view of (eg.) > source_arch_from_control and source_arch_override (P-a-s). > > (Or, well: without the above, where is P-a-s information merged into > the DB? Directly into source_arch? Maybe we can retain P-a-s as > input, but use already the two-table scheme I mentioned in the > paragraph above, writing P-a-s into source_arch_override.) I think having a general mechanism to dump/restore P-a-s would be useful. I think having a separate list of arches (control and override) is sensible. I also like the idea of having a view on this, but depending on what we want to do, we might again need to check about foreign keys and views. > 8. Is binary package information needed for anything else than > dep-waits? </ignorance> Does that table need section and priority? I don't think it's strictly needed, but might possibly be useful for front-ends. What do others think? > 9. History. In your introductory mail, you mention that the database is > meant to keep full history; I infer this mainly refers to the > "build_jobs" table. I have reservations about this design (not about > the idea of keeping full history itself), which I'll explain now. > > I don't like very much the idea of keeping the state change history > (which is what build_jobs seems to do) together with the information > about the current (most recent) state. One concern would be > performance: how will fetching the current state for one package > perform in 5 years time? (This concern may just be due to me being > ignorant about PostgreSQL real capabilities. Also, I see you > mentioned pruning the table on IRC; see below about this.) Indexing on source package and time would allow one to easily pull out the last (current) job state. There is also another concen about space usage. If we never delete old data, the database could blow up to quite a large size. While it's hard to predict just how much space would be used until it's tested, I think having a cron job running weekly/monthly to remove older data would mitigate this (and also have the side effect of making queries faster). If you check the schema, you'll see quite a few of the foreign key references have ON DELETE CASCADE added to the constraints. For the build jobs, this means that if an architecture or component is removed, all jobs referring to the deleted arch/component in this table will automatically be deleted. Additionally, when a source package is removed from sources, the state changes associated with that pacakge will auto trigger automatic deletion. As a result, keeping /n/ previous revisions and/or having a time limit on source package lifetime will result in automatic pruning of the jobs table. > Another concern would be clumsiness of queries. For example, how > would one obtain the current state in unstable for packages foo, bar, > and baz? (I know current state is easy to obtain for a single > package, with eg. "ORDER BY ctime DESC LIMIT 1", but I can't figure > out how to get it for multiple packages with one query. And the > front-end *will* want to query for multiple packages. This, again, > may be a lack of SQL knowledge on my part, apologies if that's the > case.) My SQL knowledge is by no means expert, but I think you could use GROUP BY and an aggregate function such as max() to obtain the information. However, I think a view on this would provide a much nicer interface :) And views also have the advantage that the database can optimise things behind the scenes. > What I had imagine we would have, prior to reading this design, was a > build_state or similar table, that just records the current state for > each source/distribution/arch. And a separate build_state_history > table that would record every state change. With a view, we would have just this. OTOH, if this turns out to be inefficient and/or ugly, I would not object to another table. One question I do have is exactly what we want to store in build_state_history? Would it be a subset or superset of what's in build_jobs? > On IRC, there was talk about periodically pruning the sources table, > and hence pruning the history on build_jobs as well. I think sources > should always be prunned as soon as a version is no longer present in > any of the distributions, but I also think build_state_history > shouldn't be prunned at all, and it should just lose referential > integrity after prunning sources. (This is of course my opinion, and > the matter is open for discussions.) I'm not sure that's possible with postgres WRT losing referential integrity. You can choose to set the column to NULL or DEFAULT, but you can't keep the invalid column(s) when the reference goes away. We could for this table not have any referential integrity at all I guess, but that might make cleaning it up annoying, as well as slowing down joins when querying it. > Marc said he wouldn't know what to do with that state change history, > but I'd say "why not have it?". There's always time for pruning if > nobody ever finds a use for it, or it takes up too much space. > > Thoughts? On buildd.debian.org, you can currently look at the history going back a few versions for packages on all arches (this is just the build logs). Having a more detailed view of that would be useful for maintainers to check for problems with their packages being built, and also for looking at the performance of buildds themselves, such as the delay between needs-build and building, build times, etc. > 10. Build logs. I'd really like for the DB to store a table with all the > available logs: source, version, arch, date, state, and path. This could be stored as a separate table, or in build_job_properties. A separate table would probably be cleaner. I'd also like to store the build time and space statistics (currently stored by sbuild in a GDBM database) stored here along with the logs. > 11. I see from your 38dd87a commit message that the "log" table is meant > for "storing changelog". I've searched the IRC logs of your > conversations about this schema, but I couldn't find a reference to > it. Storing changelog of what, precisely? This is to store what's currently in the "transaction log", i.e. a log of all main database activity such as package additions, state changes etc. Perhaps the database isn't the best place for this, but since you might be connecting remotely, I wasn't sure if a logfile was that great either. This would record "when", "who" and "what" for each main change, useful for debugging or tracking down mistakes. I'll be happy to remove it if this is a Bad Idea. > 12. binNMU handling. Marc mentioned en-passé that the binNMU > version/level should be a column in build_job/build_state, and I > agree. I'm just not very sure if we want history of these, and how > they'd fit with/into build_state_history. I'm not sure why you wouldn't want to store them in the history. Surely they should show up as a needs-build state change to trigger the build, with subsequent state changes (uploaded/installed etc.) tracked? Since you can't store them in packages, I think they should be installed in the job/state record (perhaps defaulting to NULL for the typical non-NMU case). > 13. The _name suffix everywhere annoys me a bit, since I like having > tables that can be natural join'ed, or using(foo) join'ed. Maybe > this is just me, and the _name suffix is excelent SQL practice, but > I thought I'd mention to see what other think. If it makes queries simpler, and more understandable, then I'm all for it. > 14. I'm attaching 3 trivial patches that hopefully nobody will find > objectionable. These are now applied. > 15. Could we name the database "wannab"? I'm not too bothered what it's called. Any other suggestions? On the wanna-build side, this will be made configurable in any case, so you can name it whatever you like in the connect string. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail.
diff --git a/db/db.sql b/db/db.sql index 1d0ba2b..973c1b7 100644 --- a/db/db.sql +++ b/db/db.sql @@ -23,13 +23,40 @@ COMMENT ON DATABASE "sbuild-packages" IS 'Debian source builder package state ma \i version.sql +CREATE TABLE archives ( + id serial CONSTRAINT archives_pkey PRIMARY KEY, + name text CONSTRAINT archives_name UNIQUE NOT NULL, + origin_server text, + description text +); + +CREATE TABLE components ( + id serial CONSTRAINT components_pkey PRIMARY KEY, + name text CONSTRAINT components_name UNIQUE NOT NULL, + description text, + meets_dfsg boolean DEFAULT 'true' +); + +COMMENT ON TABLE components IS 'Valid archive components (sections)'; +COMMENT ON COLUMN components.name IS 'Component ID'; +COMMENT ON COLUMN components.name IS 'Component name'; +COMMENT ON COLUMN components.description IS 'Component description'; +COMMENT ON COLUMN components.meets_dfsg IS 'Component contents meet Debian Free Software Guidelines'; + +INSERT INTO components (name) VALUES ('main'); +INSERT INTO components (name) VALUES ('contrib'); +INSERT INTO components (name) VALUES ('non-free'); + CREATE TABLE architectures ( - name text - CONSTRAINT arch_name PRIMARY KEY + id serial CONSTRAINT arch_pkey PRIMARY KEY, + name text CONSTRAINT arch_name UNIQUE NOT NULL, + description text ); COMMENT ON TABLE architectures IS 'Valid architectures'; +COMMENT ON COLUMN architectures.id IS 'Architecture ID'; COMMENT ON COLUMN architectures.name IS 'Architecture name'; +COMMENT ON COLUMN architectures.description IS 'Architecture description'; INSERT INTO architectures (name) VALUES ('all'); INSERT INTO architectures (name) VALUES ('any'); @@ -153,15 +180,13 @@ CREATE TABLE sources ( name text NOT NULL, version debversion NOT NULL, - component_name text - CONSTRAINT source_comp_fkey REFERENCES components(name) - ON DELETE CASCADE - NOT NULL, - pkg_section_name text - CONSTRAINT source_pkg_sect_fkey REFERENCES package_sections(name) - NOT NULL, - pkg_priority_name text - CONSTRAINT source_pkg_pri_fkey REFERENCES package_priorities(name) + component_name integer NOT NULL + CONSTRAINT source_comp_fkey REFERENCES components(id) + ON DELETE CASCADE, + section_name text + CONSTRAINT source_sect_fkey REFERENCES package_sections(name) + priority_name text + CONSTRAINT source_pri_fkey REFERENCES package_priorities(name) NOT NULL, maintainer text NOT NULL, uploaders text, @@ -178,9 +203,9 @@ CREATE INDEX sources_pkg_idx ON sources (name); COMMENT ON TABLE sources IS 'Source packages common to all architectures (from Sources)'; COMMENT ON COLUMN sources.name IS 'Package name'; COMMENT ON COLUMN sources.version IS 'Package version number'; -COMMENT ON COLUMN sources.component_name IS 'Archive component'; -COMMENT ON COLUMN sources.pkg_section_name IS 'Package section'; -COMMENT ON COLUMN sources.pkg_priority_name IS 'Package priority'; +COMMENT ON COLUMN sources.section_name IS 'Archive section'; +COMMENT ON COLUMN sources.section_name IS 'Package section'; +COMMENT ON COLUMN sources.priority_name IS 'Package priority'; COMMENT ON COLUMN sources.maintainer IS 'Package maintainer name'; COMMENT ON COLUMN sources.uploaders IS 'Package uploader names'; COMMENT ON COLUMN sources.build_dep IS 'Package build dependencies (architecture dependent)'; @@ -193,11 +218,11 @@ CREATE TABLE source_architectures ( source_name text NOT NULL, source_version debversion NOT NULL, - arch_name text - CONSTRAINT source_arch_arch_fkey REFERENCES architectures(name) + arch_id integer + CONSTRAINT source_arch_arch_fkey REFERENCES architectures(id) ON DELETE CASCADE NOT NULL, - UNIQUE (source_name,source_version,arch_name), + UNIQUE (source_name,source_version,arch_id), CONSTRAINT source_arch_source_fkey FOREIGN KEY (source_name, source_version) REFERENCES sources (name, version) ON DELETE CASCADE @@ -206,25 +231,25 @@ CREATE TABLE source_architectures ( COMMENT ON TABLE source_architectures IS 'Source package architectures (from Sources)'; COMMENT ON COLUMN source_architectures.source_name IS 'Package name'; COMMENT ON COLUMN source_architectures.source_version IS 'Package version number'; -COMMENT ON COLUMN source_architectures.arch_name IS 'Architecture name'; +COMMENT ON COLUMN source_architectures.arch_id IS 'Architecture name'; CREATE TABLE binaries ( name text NOT NULL, version debversion NOT NULL, - arch_name text - CONSTRAINT bin_arch_fkey REFERENCES architectures(name) + arch_id integer + CONSTRAINT bin_arch_fkey REFERENCES architectures(id) ON DELETE CASCADE NOT NULL, source_name text NOT NULL, source_version debversion NOT NULL, - pkg_section_name text + section_name text CONSTRAINT bin_sect_fkey REFERENCES package_sections(name) NOT NULL, - pkg_priority_name text + priority_name text CONSTRAINT bin_pri_fkey REFERENCES package_priorities(name) NOT NULL, - CONSTRAINT bin_pkey PRIMARY KEY (name, version, arch_name), + CONSTRAINT bin_pkey PRIMARY KEY (name, version, arch_id), CONSTRAINT bin_src_fkey FOREIGN KEY (source_name, source_version) REFERENCES sources (name, version) ON DELETE CASCADE @@ -233,11 +258,11 @@ CREATE TABLE binaries ( COMMENT ON TABLE binaries IS 'Binary packages specific to single architectures (from Packages)'; COMMENT ON COLUMN binaries.name IS 'Binary package name'; COMMENT ON COLUMN binaries.version IS 'Binary package version number'; -COMMENT ON COLUMN binaries.arch_name IS 'Architecture name'; +COMMENT ON COLUMN binaries.arch_id IS 'Architecture ID'; COMMENT ON COLUMN binaries.source_name IS 'Source package name'; COMMENT ON COLUMN binaries.source_version IS 'Source package version number'; -COMMENT ON COLUMN binaries.pkg_section_name IS 'Package section'; -COMMENT ON COLUMN binaries.pkg_priority_name IS 'Package priority'; +COMMENT ON COLUMN binaries.section_name IS 'Package section'; +COMMENT ON COLUMN binaries.priority_name IS 'Package priority'; CREATE TABLE job_states ( name text @@ -286,8 +311,8 @@ CREATE TABLE dist_binaries ( binary_name text NOT NULL, binary_version debversion NOT NULL, - arch_name text - CONSTRAINT dist_bin_arch_fkey REFERENCES architectures(name) + arch_id text + CONSTRAINT dist_bin_arch_fkey REFERENCES architectures(id) ON DELETE CASCADE NOT NULL, distribution_name text @@ -295,16 +320,16 @@ CREATE TABLE dist_binaries ( ON DELETE CASCADE NOT NULL, CONSTRAINT dist_bin_pkey PRIMARY KEY (binary_name, distribution_name), - CONSTRAINT dist_bin_bin_fkey FOREIGN KEY (binary_name, binary_version, arch_name) - REFERENCES binaries (name, version, arch_name) + CONSTRAINT dist_bin_bin_fkey FOREIGN KEY (binary_name, binary_version, arch_id) + REFERENCES binaries (name, version, arch_id) ON DELETE CASCADE, - CONSTRAINT dist_bin_unique UNIQUE (binary_name, binary_version, arch_name, distribution_name) + CONSTRAINT dist_bin_unique UNIQUE (binary_name, binary_version, arch_id, distribution_name) ); COMMENT ON TABLE dist_binaries IS 'Binary packages contained within a distribution'; COMMENT ON COLUMN dist_binaries.binary_name IS 'Binary package name'; COMMENT ON COLUMN dist_binaries.binary_version IS 'Binary package version number'; -COMMENT ON COLUMN dist_binaries.arch_name IS 'Architecture name'; +COMMENT ON COLUMN dist_binaries.arch_id IS 'Architecture name'; COMMENT ON COLUMN dist_binaries.distribution_name IS 'Distribution name'; CREATE TABLE build_jobs ( @@ -314,8 +339,8 @@ CREATE TABLE build_jobs ( NOT NULL, source_version debversion NOT NULL, - arch_name text - CONSTRAINT build_jobs_arch_fkey REFERENCES architectures(name) + arch_id text + CONSTRAINT build_jobs_arch_fkey REFERENCES architectures(id) ON DELETE CASCADE NOT NULL, distribution_name text @@ -332,7 +357,7 @@ CREATE TABLE build_jobs ( ctime timestamp with time zone NOT NULL DEFAULT CURRENT_TIMESTAMP, CONSTRAINT build_jobs_unique UNIQUE(source_name, source_version, - arch_name), + arch_id), CONSTRAINT build_jobs_src_fkey FOREIGN KEY(source_name, source_version) REFERENCES sources(name, version) ON DELETE CASCADE @@ -346,7 +371,7 @@ COMMENT ON TABLE build_jobs IS 'Build job tickets (state changes) specific for s COMMENT ON COLUMN build_jobs.id IS 'Job number'; COMMENT ON COLUMN build_jobs.source_name IS 'Source package name'; COMMENT ON COLUMN build_jobs.source_version IS 'Source package version number'; -COMMENT ON COLUMN build_jobs.arch_name IS 'Architecture name'; +COMMENT ON COLUMN build_jobs.arch_id IS 'Architecture name'; COMMENT ON COLUMN build_jobs.distribution_name IS 'Architecture version'; COMMENT ON COLUMN build_jobs.user_name IS 'User making this change (username)'; COMMENT ON COLUMN build_jobs.builder_name IS 'Build dæmon making this change (username)'; @@ -383,13 +408,13 @@ COMMENT ON COLUMN log.message IS 'Log entry message'; CREATE OR REPLACE FUNCTION package_checkrel() RETURNS trigger AS $package_checkrel$ BEGIN - PERFORM name FROM package_sections WHERE (name = NEW.pkg_section_name); + PERFORM name FROM package_sections WHERE (name = NEW.section_name); IF FOUND = 'f' THEN - INSERT INTO package_sections (name) VALUES (NEW.pkg_section_name); + INSERT INTO package_sections (name) VALUES (NEW.section_name); END IF; - PERFORM name FROM package_priorities WHERE (name = NEW.pkg_priority_name); + PERFORM name FROM package_priorities WHERE (name = NEW.priority_name); IF FOUND = 'f' THEN - INSERT INTO package_priorities (name) VALUES (NEW.pkg_priority_name); + INSERT INTO package_priorities (name) VALUES (NEW.priority_name); END IF; RETURN NEW; END;
Description: Digital signature