[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Biological data being used by an unpublished research paper is considered proprietary




Hi,

This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic.

However, I thought that maybe people on these lists would have some input on the matter. People in Debian are very experienced in matters of copyright and licensing, and people in debian-med presumably know something about copyright/licensing of biological data.

I posted the following to academia.stackexchange.com, http://academia.stackexchange.com/q/12718/285

As I write there is one reply.

Summary of my SE question:

1) A distributor of biological data is claiming proprietary ownership of the data. This runs contrary to what I know about such data. Can anyone comment?

2) The distributor also says a script to download the (200) data files is prohibited. Saying I cannot use a script to download the data (curl in my case) is in IMO downright bizarre. Is expecting a user to download 200 files manually reasonable, and how would the server tell the difference anyway? They're all just http requests.

Please CC me on any reply. Thanks.
                                                         Regards, Faheem

#########################################################################
http://academia.stackexchange.com/q/12718/285
#########################################################################

This question may be too specialist to be on-topic here. In which
case, please feel free to transfer it to another SE site, or close, as
appropriate.

I am planning to publish an applied statistics paper. This paper develops an algorithm and then applies this algorithm to some data. I obtained most of this data from the site http://www.imgt.org. The data I am using are immunoglobulin and T cell receptor nucleotide sequences, in the form of FASTA files. I'm using around 200 of these.

Here is an [random example][1] of the data I am using (click on [6 Sequence (FASTA format)] to get the FASTA file).

Now, I have a problem. In [Warranty Disclaimer and Copyright Notice](http://www.imgt.org/Warranty.html), is written

The IMGT® software and data are provided as a service to the
scientific community to be used only for research and educational
purposes. Individuals may print or save portions of IMGT® for their
own personal use. Any other use of IMGT® material need prior written
permission of the IMGT director and of the legal institutions (CNRS
and Université Montpellier 2).

I just heard from Prof. Marie-Paule Lefranc and she replied:

I have no objection that the data you retrieved for your work from
IMGT/LIGM-DB be made available to the reviewers, but unfortunately we
cannot authorize a script or a distribution of the IMGT/LIGM-DB files
with your code to the users.

You can provide the users with the list of the IMGT/LIGM-DB accession
numbers you used, with the source of the data clearly identified:
(IMGT/LIGM-DB version number) and reference to NAR 2006.

Well, this just made my life more difficult. To start with, I'm
puzzled by this. Isn't biological data like this public domain? Is it
really possible to treat immunoglobulin and T cell receptor nucleotide
sequence data as proprietary information?

I just wrote back and asked Prof. Lefranc what license the data was
published under, which I had not done earlier.

Additionally, how does one make data available to reviewers and not to
users? That is awkward, to say the least.

######################################################################

Reply to: