[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Organised mirroring of public databases



Dear all,

I parked here

http://svn.debian.org/wsvn/debian-med/trunk/community/infrastructure/getData.pl?op=file&rev=0&sc=0

a script which allows the download of external databases in a fairly
straight-forward manner. This is fairly far from perfect but may help to
get ourselves organised towards that said shared aim.

The tool should be extended to allow
 * the addition of databases locally (but hey, since we are on svn and
the databases mostly public, there should not be much of a need to add
databases for oneself only)
 * versioning of databases. Most sites feature past releases for a while
which should be modelled properly.
 * formally specify subsets of the databases, like only mammalian or
human data, if offered as such by upstream maintainers.

We should not (immediately) think of
 * the specification of local mirrors of some public site
 * disk space issues
 * dependencies between downloaded datasets, e.g., the automated rewrite
of EMBL format to FASTA, since such are available online as well. This
would induce ambiguities and possibly also increase utilised bandwith.

So, what database should we address first? The small ones, so I suggest.

Best regards

Steffen


Reply to: