[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Packaging PDB Chemical Component Dictionary



Hi Andrius,

Although I agree that having an up-to-date CCD file is useful. And having it installed by the system would even be better. But I wonder, who is going to make new packages with the latest CCD file? That's a task that needs to be executed weekly. Or do you envision a way to automate that?

I also assume you want to replace the current cron script from libcifpp with this new package?

regards, -maarten

Op 23-06-2023 om 08:31 schreef Andrius Merkys:
Hello,

TL;DR: I propose packaging frequently updated PDB Chemical Component Dictionary. Reasons, technical solutions and limitations below.

PDB Chemical Component Dictionary (CCD) [1] is a single file (~400 MB uncompressed) collection of small molecule components found in PDB entries. It is used by at least a couple of Debian packages: openstructure, which needs it as a build dependency, and libcifpp.

For openstructure I have resorted to putting some version of the CCD in debian/ directory to fulfill the build requirement and then provide it as /usr/share/openstructure/components.cif.gz. However, due to this CCD is not updated as frequently as it is released. Moreover, large-sized debian/ directories are frowned upon. Therefore I would like to outsource the CCD.

libcifpp package provides a cron task which keeps an up-to-date CCD in its cache directory, which is good as Debian-packaged CCD file would stay static between Debian releases. However, this does not help building openstructure due to network access constraint.

I propose packaging CCD as a separate source package. It does not have version, thus update date would have to be used instead. I have hacked together a watch file to check for new versions, but it fails on mk-origtargz step:

version=4
opts="downloadurlmangle=s|status.*|monomers/components.cif.gz|,filenamemangle=s|(\d+)/$|ccd-$1.gz|" \
https://files.wwpdb.org/pub/pdb/data/status/ \
 https://files.wwpdb.org/pub/pdb/data/status/(\d+)/

Thus the tarball would have to be produced by get-orig-source target in debian/rules unless there are other solutions.

Here I would like to ask for comments and suggestions. I am aware that packaging large and frequently updated data files is not usual practice, but I believe that doing so would both resolve problems with building openstructure and benefit users needing a stable CCD version.

[1] https://www.wwpdb.org/data/ccd

Best wishes,
Andrius



Reply to: