How much data load is acceptable in debian/ dir and upstream (Was: edtsurf_0.2009-7_amd64.changes REJECTED)
in the Debian Med team there are two GSoC students very busy to write
autopkgtests for (in the long run) all our packages (if possible). For
several packages it is necessary to provide data sets which in many
cases are not provided together with the upstream source. The students
rather were seeking the internet for data available in scientific
publications or some public databases. I personally think they did
quite a good job in doing so to test our software in some real world
In the Debian Med team we have the rule that we do not only rely on the
existing packaging source directory for the autopkgtest but in addition
provide the test script (and thus the needed data) inside
/usr/share/doc/pkgname to serve as a useful example for the program in
question on one hand and also enable users to run the test right on
their machine as well.
In the case of larger data sets it seems to be natural to provide the
data in a separate binary architecture all package to not bloat the
machines of users who do not want this and also save bandwidt of our
mirroring network. New binary packages require new processing and my
question is here about a set of rejection mails we received ( .
On Sun, Sep 13, 2020 at 12:00:08PM +0000, Thorsten Alteholz wrote:
> your debian tar file is much too large.
I admit the debian/ dir (2.7MB) exceeds the real code (300kB) by far.
However can we please fix somewhere in our packaging documentation
what size of the debian/ dir is acceptable or not.
> Please put all data in a separate source package and don't forget to add the copyright information.
I think we should try to document somehow, when there is a need for
some separate source package. I would agree if the code is some kind
of moving target and data would not change or if there is some kind
of versioned downloadable tarball or the data can be shared between
different software package. But here none of these conditions is
> But as you don't really need 4we2.ply, you might just omit as well.
I think in the example of edtsurf this is the major point. You have
given the perfectly helpful hint to Pranav in some other case and
instead of shipping data for the only reason of comparing the result we
started to ship check sums instead. So I think for this case we can
settle with some solution.
On Sun Sep 13 13:00:09 BST 2020, Thorsten Alteholz wrote:
> please explain why you need such a huge amount of test data in this package.
Shayan has explained it in 2a. I also think if upstream delivers the
source package that way we should not really change the tarball to
shrink the size of the data originally shipped if the license is OK.
The same question as I had above for the term "much too large" applies
here for "huge amount". Some kind of rule of thumb what is acceptable
or not would be helpful.
I'm also wondering what you mean by "Please explain" in a "reject" mail.
For my understanding someone asks for an explanation before a decision
is drawn. But the reject is actually a decision. In what form would
you expect the explanation. Probably not via mail (as Shayan did) since
this would not bring back the package into the new queue. So could you
please be more verbosely like:
Please explain in debian/README.??? why you decided to keep
all test data that is provided by upstream.
or something like this. Shayan is a very dedicated and extremely
productive newcomer. It would be great if he would get some more
helpful advise how to enhance. Since I personally also have no
real clue I'm writing here for some kind of general clarification.
On Sun Sep 13 18:00:08 BST 2020, Thorsten Alteholz wrote:
> please don't hide data under debian/*.
Sorry, Thorsten but I think "hiding" is not the right term. We have no
other dir to add extra files than the debian/ dir. That's why I think
Nilesh was correct to store the data here - where these IMHO naturally
belong since these are test data and thus are next to the autopkgtest
> If you really need those data please create a separate source package
That's the question that I'm repeatedly wondering about and thats why I
assemble all these three rejects here in one mail: What is the general
opinion for creating a separate source package in cases like this. I
do not see any profit from an extra source package. From my point of
view autopkgtest data are belonging to the packaging code and thus are
fine here. But for sure I might be wrong and would like to clarify
> and never ever do Recommend: such package.
That's absolutely correct and definitely an oversight of mine as the
sponsor of this package. Sorry about this. On the other hand I'm not
sure whether it is a reason for a reject. If there would be no other
issue for the package I would consider it more productive for all
of us if you would accept and file an RC bug that could be fixed in
the source-only upload that is needed anyway.
> Don't forget to mention the copyright information.
In principle yes, but these data are not copyrightable as far as I know.
Nilesh has mentioned the origin of data in debian/tests/README to
provide a reference. If you consider this information not sufficient
please let us know a better way.
I'm trying to clarify the questions here and we will add this to the
Debian Med policy at least (for a start - I guess this question might
come up in other teams as well) to make sure our we will push better
packages in future into new queue.
For sure as always I like to express my explicit thanks to Thorsten who
has spent a lot of hours with all our packages. We really appreciate
this effort and would love to become better to decrease the amount of
time this kind of packages might take.