[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#783469: More statistics about packages sizes



Hi,

On 04/05/15 at 23:11 +0200, Orestis Ioannou wrote:
> Hello,
> 
> I ve been playing around with this bug :)
> 
> At first I thought I'd calculate the quartiles and create some tables
> and then plot the data using boxplots. I managaed to generate the tables
> but then i found out that pyplot's boxplot uses as input the data itself
> and not the quartiles in question.
> 
> So I am wondering how should I proceed.
> Right now the calculated quartiles are saved in the stats dict during
> the statistics update run and saved in the cache file stats.data. Then I
> found out that we can extract the generated quartiles from the pyplot
> boxplot. So i see 4 options:
> 
> - extract the data from boxplot and save in cache. This means reading
> the current stats from the file and updating its values although this is
> not supposed to be done on a charts update and if somebody disables
> charts then that table won't be available.
> 
> - Let pyplot do its thing and calculate elsewhere the quartiles for a
> table representation.The complexity for the calculation is :
> 	* database queries to get the metrics for each suite or one group by query.
> 	* sub-setting a list
> 	* calculating 3 medians (one for the actual median and the other ones
> for the lower_half and upper_half set of values) one min and one max.
> 
> - Do not create any summary table with these values and keep only the
> boxplot. Lucas what do you think on that? You mentioned generating a
> graph so does this mean that a summary table wouldn't be of much use?

I think that the summary table is also useful, especially given that the
first quartile value is really small. It would be hard to say from the
graph if it's increasing or decreasing.

> - hack into boxplot to generate custom boxplots. Already done by
> somebody [1]. IMHO it looks pretty clean since it just overwrites some
> values and keeps all the functionalities of the boxplot intact.

I have no particular comment about the best strategy here, but given the
SLOC values are pre-computed, I wouldn't worry too much about the
performance for computing/generating those stats.

Lucas


Reply to: