On Fri, Mar 16, 2018 at 05:48:35PM +0100, Andreas Tille wrote:
> > I included a genome sequence from NCBI as test data. Should I indicate the
> > source of this data somewhere in the package (e.g., in Readme.tests)?
>
> I think the best place would be debian/copyright since a data file
> should come with a license. I would say something like
>
>
> Files: debian/tests/test-data
> Copyright: yyyy-yyyy Copyright-Owner
> License:
> Comment:
> This file was obtained by
> wget URL
Please let us know here on the list if you spot any problem to fill
in the details.
I have tried to find the information that should be placed in d/copyright, but unfortunately, I don't have any experience with this kind of stuff. So this is what I have found so far:
- I downloaded the test sequence from the Ensemble Bacteria, which is an online database of EMBL-EBI. At the bottom of the sequence page, it is indicated "Ensembl Bacteria release 38 - January 2018 © EMBL-EBI". I couldn't find any license they may have for the data, except that Terms of Use states: "EMBL-EBI itself places no additional restrictions on the use or redistribution of the data available via its online services other than those provided by the original data owners."
- At first, the sequence was deposited in the GenBank database (as said in this paper). I didn't find if the GenBank has copyright, but they say here: "NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted."
- Finally, I have found that the sequence itself has no patent (this had to be indicated in the field LOCUS of the GenBank file as PAT). I suspect the sequence has no copyright as well, although I can't find a confirmation.
Could you, please, point me in the right direction. I feel like I'm getting stuck.
Thank you,
Liubov