[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Re: Re: Big data is needed for unit test



Hi

With the authorization of the responsibles of the project, I published the
file here [2]

It contains the names of one patient and his birth date so that
probably wasn't a good idea. This file appears to contain CT scan
results in a custom format? I can't view the scan itself as the
software isn't packaged yet :) I was able to view the metadata though.

It's not about a real patient. It's an aquisition done especially for the
tests of the software. The "patient" is in fact a developer of the framework,
and you can find the informations that you saw in the .json in the source
code of the software, on the googlecode repository [1].
So I don't think that there is any problem of confidentiality in this case.

Back to the original question of reducing the size of the data:
You could unzip the file, remove all of the large .raw files and leave
some small ones, modify root.json to remove the entries for .raw files
you removed and then zip the file up again. I'm not sure if this would
result in a valid file or not.

No, I can't do that because some unit tests could use .raw files, and I
can't delete one of them without breaking the data file's integrity.

You could also do another scan at a much lower resolution if that is
possible with the equipment you have.

Unfortunately, I haven't the required equipment to do that, and I'm not in
charge to create new unit tests (based on potential new data). But it's true
that it could be a great solution... Except that my problem for this unit
test is the same for all the others unit test... I've more than 4GB of data,
so in all cases, I will have a big data's quantity.

Anyway, I don't consider the size to be a big issue as long as you put
the data in a second orig.tar.gz.

I guess that this new orig.tar.gz would be created by using uscan (if the
link is added in d/watch) ?

Google Drive is very unfriendly to people who turn off JS, Cookies
etc, next time please upload the file to somewhere else and link
directly to the file download URL instead of indirect ways to find the
file.

So I have to upload my data (4GB) somewhere where uscan could find it. But
I've no idea about where upload it, given Github doesn't accept files bigger
than 100MB. Have you any idea ?



Thank you for your help


Best regards,

Corentin


[1] https://code.google.com/p/fw4spl/source/browse/Bundles/LeafPatch/patchMedicalData/test/tu/src/PatchTest.cpp


Reply to: