[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Humble comments on... (Was: Packaging PDB Chemical Component Dictionary)



Hello Andrius,

You politely and confidently invited comments.

1.) Thanks!

2.) I like your use of the concise "TL;DR" (ie:
    "Too Long; Don't Read").

3.) Sometimes I shorten it even more, to "TLDR".

4.) How is the PDB Chemical Component Dictionary
    licensed, if at all? Is it in the public domain?

5.) Could it somehow compliment a big computer spread
    sheet summarizing life span experiments, where
    a column named "intervention" contains the
    names of molecules, like "vitamin c" and
    "glycine"[1]?

6.) Could debian packages have version numbers
    based on dates the CCD was downloaded?

Thanks,
Kingsley

[1] World's biggest collection of the results of
    life span experiments
    https://kingsleymorse.ch/life_extension.html#preprocessed_life_span_data

On 06/23/2023 09:31, Andrius Merkys wrote:
> Hello,
> 
> TL;DR: I propose packaging frequently updated PDB Chemical Component
> Dictionary. Reasons, technical solutions and limitations below.
> 
> PDB Chemical Component Dictionary (CCD) [1] is a single file (~400 MB
> uncompressed) collection of small molecule components found in PDB entries.
> It is used by at least a couple of Debian packages: openstructure, which
> needs it as a build dependency, and libcifpp.
> 
> For openstructure I have resorted to putting some version of the CCD in
> debian/ directory to fulfill the build requirement and then provide it as
> /usr/share/openstructure/components.cif.gz. However, due to this CCD is not
> updated as frequently as it is released. Moreover, large-sized debian/
> directories are frowned upon. Therefore I would like to outsource the CCD.
> 
> libcifpp package provides a cron task which keeps an up-to-date CCD in its
> cache directory, which is good as Debian-packaged CCD file would stay static
> between Debian releases. However, this does not help building openstructure
> due to network access constraint.
> 
> I propose packaging CCD as a separate source package. It does not have
> version, thus update date would have to be used instead. I have hacked
> together a watch file to check for new versions, but it fails on
> mk-origtargz step:
> 
> version=4
> opts="downloadurlmangle=s|status.*|monomers/components.cif.gz|,filenamemangle=s|(\d+)/$|ccd-$1.gz|"
> \
> https://files.wwpdb.org/pub/pdb/data/status/ \
>  https://files.wwpdb.org/pub/pdb/data/status/(\d+)/
> 
> Thus the tarball would have to be produced by get-orig-source target in
> debian/rules unless there are other solutions.
> 
> Here I would like to ask for comments and suggestions. I am aware that
> packaging large and frequently updated data files is not usual practice, but
> I believe that doing so would both resolve problems with building
> openstructure and benefit users needing a stable CCD version.
> 
> [1] https://www.wwpdb.org/data/ccd
> 
> Best wishes,
> Andrius
> 

-- 
Time is the fire in which we all burn.


Reply to: