[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#783469: More statistics about packages sizes



Hey,
On 05/05/2015 04:34 PM, Lucas Nussbaum wrote:
> Hi,
> 
> On 04/05/15 at 23:11 +0200, Orestis Ioannou wrote:
[...]
>>
>> - Do not create any summary table with these values and keep only the
>> boxplot. Lucas what do you think on that? You mentioned generating a
>> graph so does this mean that a summary table wouldn't be of much use?
> 
> I think that the summary table is also useful, especially given that the
> first quartile value is really small. It would be hard to say from the
> graph if it's increasing or decreasing.
> 

Ok good :)

>> - hack into boxplot to generate custom boxplots. Already done by
>> somebody [1]. IMHO it looks pretty clean since it just overwrites some
>> values and keeps all the functionalities of the boxplot intact.
> 
> I have no particular comment about the best strategy here, but given the
> SLOC values are pre-computed, I wouldn't worry too much about the
> performance for computing/generating those stats.
> 
> Lucas
> 

I did some more digging mostly since i discovered that my computation
for quartiles was giving sometimes different results for the Q1 and Q3
than the one in pyplot.
Long story short I compared them with the help of R and i found out in
the documentation [1] that there are many ways to calculate them. I used
the type 2 and pyplot, if i am correct (results coincide), uses type 7.

If i understand properly the doc type 7 makes the assumption that the
sample is continuous whereas type 2 considered to be discontinuous.
Since my knowledge in statistics is fairly minimal i am not really sure
what we have in this case :p

In any case if we consider that the sample is discontinuous then i think
the best option, in order to have same results both in the table and the
graph is to hack into pyplot to insert pre - calculated quartiles in
boxplot. I am thinking this since i couldn't find any way to tell pyplot
to calculate quartiles in another way.
If however the sample is considered continuous then i ll have to
implement another algorithm to calculate quartiles so that the results
are the same.

Orestis

[1] https://stat.ethz.ch/R-manual/R-patched/library/stats/html/quantile.html

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: