Bug#787570: debsources: DB: change sha256 column data type from string to bytea or bit string
Package: qa.debian.org
Severity: normal
User: qa.debian.org@packages.debian.org
Usertags: debsources
Currently, the sha256 column of the checksum table in Debsources' Postgres DB
has type character varying(64):
sha256 | character varying(64) | not null
Such a data type is wasteful in terms of disk space. And it shows:
debsources=> select count(*) from checksums;
count
----------
41151812
public | checksums | table | debsources | 4890 MB |
We should switch to a more economic (and efficient) data type for storing
sha256 checksums. Good options seem to be either bytea [1] or fixed-size bit
strings [2]. Suggestions welcome!
A good, concrete way to help with this bug would be providing sample SQL
snippets to create temporary tables with the new data types, and convert /
inject into them the content of the current checksum table. That would allow to
easily benchmark disk usage and query/index efficiency.
Cheers.
[1]: http://www.postgresql.org/docs/9.4/static/datatype-binary.html#AEN5497
[2]: http://www.postgresql.org/docs/9.4/static/datatype-bit.html
-- System Information:
Debian Release: stretch/sid
APT prefers testing
APT policy: (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=it_IT.utf8, LC_CTYPE=it_IT.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Reply to: