[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Problem with LARGE files



On Mon, 30 Apr 2007 11:27:59 +0100 Chris Jefferson wrote:

> I am currently finishing work on a program which can be used to  
> identify a group of mathematical structures. I would like to release  
> it under the GPL.

Wonderful!  :-)

> However building the program involves applying a  
> lossy compression algorithm to around 400GB of data files, turning  
> them into about 50MB.

Woah!
Where do those data files come from?
I mean: I suppose they are not written by hand by human beings...

Mmmmh, do you have an infinite set of monkeys, by chance?!?   ;-)

Seriously, are they some uncompressed audio/video recordings or
something similarly acquired from the physical world (e.g.: data
measured by an appropriate measuring instrument)?
Is there any human creativity involved in the generation of those data
files?

> 
> I could possibly write a program which, using this 50MB could back  
> the 400GB data set I have on my hard disc,

Wait, you said that the compression algorithm was a lossy one!
If this is actually the case, how can you regenerate the uncompressed
data back from the lossy-compressed version?!?

> but this would probably  
> take around four months to run.

On which hardware?
On a Commodore 64?  Or on a 131072-cpu cluster supercomputer?
It's not the same, you know...  ;-) 

> 
> Would it be reasonable to request someone had to spend £100 on an  
> external hard disc and postage if they wanted to request the "source" 
> to my program?

First of all, let's determine whether those huge uncompressed data are
actually part of the source for the work.

What's relationship between the program and the compressed data?
I mean: are those data essential for the program to operate correctly?
Or maybe those are data the program operates on, and the program could
well operate on other user-supplied data?
I'm trying to understand if those compressed data are effectively part
of the software package.

What's the preferred form for making modifications to the compressed
data?
Should you modify the compressed data, would you edit the huge
uncompressed data and re-run the compression process?  Or would you
follow another path?  Which one?
I'm trying to understand if the huge uncompressed form is actually the
source form for the data.

> and is there any way I could ever get such a program  
> into Debian?

Too early to figure it out...

> Perhaps a different license?

Changing the license would not help: IMHO, Debian should (in order to
abide by its Social Contract) distribute the source anyway, even if the
license does not mandate source distribution.

We still have to see whether those huge uncompressed data files are
actually part of the source for the work, though.
Please answer the above questions...

> 
> Thank you.
> 
> PS Please CC me, I am not subscribed to the list

Done.

The usual disclaimers: IANADD, IANAL.


-- 
 http://frx.netsons.org/doc/nanodocs/testing_workstation_install.html
 Need to read a Debian testing installation walk-through?
..................................................... Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4

Attachment: pgpqZKUH1RHMf.pgp
Description: PGP signature


Reply to: