Humble comments on... (Was: Packaging PDB Chemical Component Dictionary)
Hello Andrius,
You politely and confidently invited comments.
1.) Thanks!
2.) I like your use of the concise "TL;DR" (ie:
"Too Long; Don't Read").
3.) Sometimes I shorten it even more, to "TLDR".
4.) How is the PDB Chemical Component Dictionary
licensed, if at all? Is it in the public domain?
5.) Could it somehow compliment a big computer spread
sheet summarizing life span experiments, where
a column named "intervention" contains the
names of molecules, like "vitamin c" and
"glycine"[1]?
6.) Could debian packages have version numbers
based on dates the CCD was downloaded?
Thanks,
Kingsley
[1] World's biggest collection of the results of
life span experiments
https://kingsleymorse.ch/life_extension.html#preprocessed_life_span_data
On 06/23/2023 09:31, Andrius Merkys wrote:
> Hello,
>
> TL;DR: I propose packaging frequently updated PDB Chemical Component
> Dictionary. Reasons, technical solutions and limitations below.
>
> PDB Chemical Component Dictionary (CCD) [1] is a single file (~400 MB
> uncompressed) collection of small molecule components found in PDB entries.
> It is used by at least a couple of Debian packages: openstructure, which
> needs it as a build dependency, and libcifpp.
>
> For openstructure I have resorted to putting some version of the CCD in
> debian/ directory to fulfill the build requirement and then provide it as
> /usr/share/openstructure/components.cif.gz. However, due to this CCD is not
> updated as frequently as it is released. Moreover, large-sized debian/
> directories are frowned upon. Therefore I would like to outsource the CCD.
>
> libcifpp package provides a cron task which keeps an up-to-date CCD in its
> cache directory, which is good as Debian-packaged CCD file would stay static
> between Debian releases. However, this does not help building openstructure
> due to network access constraint.
>
> I propose packaging CCD as a separate source package. It does not have
> version, thus update date would have to be used instead. I have hacked
> together a watch file to check for new versions, but it fails on
> mk-origtargz step:
>
> version=4
> opts="downloadurlmangle=s|status.*|monomers/components.cif.gz|,filenamemangle=s|(\d+)/$|ccd-$1.gz|"
> \
> https://files.wwpdb.org/pub/pdb/data/status/ \
> https://files.wwpdb.org/pub/pdb/data/status/(\d+)/
>
> Thus the tarball would have to be produced by get-orig-source target in
> debian/rules unless there are other solutions.
>
> Here I would like to ask for comments and suggestions. I am aware that
> packaging large and frequently updated data files is not usual practice, but
> I believe that doing so would both resolve problems with building
> openstructure and benefit users needing a stable CCD version.
>
> [1] https://www.wwpdb.org/data/ccd
>
> Best wishes,
> Andrius
>
--
Time is the fire in which we all burn.
Reply to: