[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#787570: debsources: DB: change sha256 column data type from string to bytea or bit string



Package: qa.debian.org
Severity: normal
User: qa.debian.org@packages.debian.org
Usertags: debsources

Currently, the sha256 column of the checksum table in Debsources' Postgres DB
has type character varying(64):

  sha256     | character varying(64) | not null
 
Such a data type is wasteful in terms of disk space. And it shows:

  debsources=> select count(*) from checksums;
    count
  ----------
   41151812

  public | checksums            | table    | debsources | 4890 MB    |

We should switch to a more economic (and efficient) data type for storing
sha256 checksums. Good options seem to be either bytea [1] or fixed-size bit
strings [2]. Suggestions welcome!

A good, concrete way to help with this bug would be providing sample SQL
snippets to create temporary tables with the new data types, and convert /
inject into them the content of the current checksum table. That would allow to
easily benchmark disk usage and query/index efficiency.

Cheers.

[1]: http://www.postgresql.org/docs/9.4/static/datatype-binary.html#AEN5497
[2]: http://www.postgresql.org/docs/9.4/static/datatype-bit.html
   
-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=it_IT.utf8, LC_CTYPE=it_IT.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)


Reply to: