[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Providing components.cif.gz [Was: Re: DeepMind’s AI Advanced Protein Structure Prediction tool Open Sourced]



Hi Michael,

On 2021-09-08 11:50, Michael Crusoe wrote:
> I would advocate for a local copy (if missing) and an environment
> variable to override so that users can get a newer/different version.

A fallback copy sounds good. Perhaps it would be best to package it in a
separate source/binary package to maintain its independence. From
codesearch.d.o [1] it seems that more source packages would be happy to
use it.

I will talk to the upstream about an environment variable with a
sensible default.

> I would also encourage upstream to find a way to embed a hash + download
> date in their logs and outputs, if possible.

Keeping track of such things is usually left for the user, but I agree
that improving provenance record makes sense.

> We should also ask PDB to version their files. Do they keep old versions
> around?

This components.cif.gz is a database of chemical compounds, and each
compound entry has its modification date. Thus the latest date in
components.cif.gz could be treated as some sort of version
identification for the database. As for old versions, I need to ask. I
do not seem to find them on their FTP server.

[1] https://codesearch.debian.net/search?q=components.cif&literal=1

Thanks,
Andrius


Reply to: