[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

graphing debian changelogs



I was a little bored so I wrote up a program to read changelogs of debian
packages and graph the info in them in various ways with gnuplot. For
example, it can graph each new version of a package and the number of lines
added to the changelog for that version on one axis, and time on the other
axis. Or it can graph the debian version number of the package on one axis
and time on the other. It can even guess at how many bugs an upload of a
package closed. It also indicates who did the upload so it's easy to
see NMU's, change of ownership, etc.

I'm not really sure what this is good for, but it does generate some
interesting graphs. For example:

* the history of the X packages, with urgency=high uploads marked specially.
  (http://kitenet.net/~joey/clog/X.gif)
  
* The X packages again, but displaying the number of bugs closed each upload
  this time. (http://kitenet.net/~joey/clog/Xbug.gif)
  
* comparing debhelper and debmake, and the size of their changelog entries
  over time. (http://kitenet.net/~joey/clog/ddsize.gif)
  
* looking at the insanely rapid increase in debmake's version number when it
  was first being developed. (http://kitenet.net/~joey/clog/dv.gif)
  
* some packages that, effectively orphaned upstream, attain ever higher
  debian version numbers. (http://kitenet.net/~joey/clog/orphans.gif)

I've attached the program and man page. I'd sort of like to get this into a
package but I don't know where it'd fit. Gnuplot? Devscripts?

-- 
see shy jo
#!/usr/bin/perl
#
# Plot the history of a debian pacakge from the changelog, displaying
# when each release of the package occurred, and who made each release.
# To make the graph a little more interesting, the debian revision of the
# package is used as the y axis.
#
# Pass this program the changelog(s) you wish to be plotted.
#
# GPL copyright 1999 by Joey Hess <joey@kitenet.net>

use Date::Parse;
use Getopt::Long;
use IO::File;
use POSIX qw{tmpnam};

my ($no_version, $no_maintainer, $gnuplot_commands,
    $save_filename, $verbose, $linecount, $bugcount)="";

my $ret=GetOptions(
	"no-version|v", \$no_version,
	"no-maint|m", \$no_maintainer,
	"gnuplot|g=s", \$gnuplot_commands,
	"save|s=s", \$save_filename,
	"urgency|u", \$show_urgency,
	"verbose", \$verbose,
	"l|linecount", \$linecount,
	"b|bugcount", \$bugcount,
);

if (! $ret || !@ARGV) {
	print STDERR <<__end__;
Usage: plotchangelog [options] changelog ..
	-v	  --no-version	  Do not show package version information.
	-m	  --no-maint	  Do not show package maintainer information.
	-u        --urgency       Use larger points for higher urgency uploads.
	-l        --linecount     Make the Y axis be number of lines in the
	                          changelog.
	-g "commands"             Pass "commands" on to gnuplot, they will be
	--gnuplot="commands"      added to the gnuplot script that is used to 
				  generate the graph.
	-s file   --save=file     Save the graph to the specified file in
	                          postscript format.
	          --verbose       Outputs the gnuplot script.
__end__
	exit 1;
};

my %data;
my ($package, $version, $maintainer, $date, $urgency)=undef;
my $data_tmpfile=tmpfile();
my $script_tmpfile=tmpfile();
my %pkgcount;
my $c;

# Changelog parsing.
foreach (@ARGV) {
	if (/\.gz$/) {
		open F,"zcat $_|" || die "$_: $!";
	}
	else {
		open F,$_ || die "$_: $!";
	}

	while (<F>) {
		chomp;
		# Note that some really old changelogs use priority, not urgency.
		if (/^(\w+.*?)\s+\((.*?)\)\s+.*?;\s+(?:urgency|priority)=(.*)/i) {
			$package=lc($1);
			$version=$2;
			if ($show_urgency) {
				$urgency=$3;
				if ($urgency=~/high/i ne undef) {
					$urgency=2;
				}
				elsif ($urgency=~/medium/i ne undef) {
					$urgency=1.5;
				}
				else {
					$urgency=1;
				}
			}
			else {
				$urgency=1;
			}
			undef $maintainer;
			undef $date;
			$c=0;
		}
		elsif (/^ -- (.*?)  (.*)/) {
			$maintainer=$1;
			$date=str2time($2);
			
			# Strip email address.
			$maintainer=~s/<.*>//;
			$maintainer=~s/\(.*\)//;
			$maintainer=~s/\s+$//;
		}
		elsif (/^(\w+.*?)\s+\((.*?)\)\s+/) {
			print STDERR "Parse error on \"$_\"\n";
		}
		elsif ($linecount && /^  /) {
			$c++; # count changelog size.
		}
		elsif ($bugcount && /^  /) {
			# count bugs that were said to be closed.
			my @bugs=m/#\d+/g;
			$c+=$#bugs+1;
		}
		
		if (defined $package && defined $version &&
		    defined $maintainer && defined $date && defined $urgency) {
		    	$data{$package}{$pkgcount{$package}++}=
				[$linecount || $bugcount ? $c : $version,
				 $maintainer, $date, $urgency];
			undef($package, $version, $maintainer, $date, $urgency);
		}
	}
	
	close F;
}

my $header=q{
set timefmt "%m/%d/%Y %H:%M"
set xdata time
set format x "%m/%d/%y"
set yrange [0 to *]
};
if ($linecount) {
	$header.="set ylabel 'Changelog length'\n";
}
elsif ($bugcount) {
	$header.="set ylabel 'Bugs closed'\n";
}
else {
	$header.="set ylabel 'Debian version'\n";
}
if ($save_filename) {
	$header.="set terminal postscript color solid\n";
	$header.="set output '$save_filename'\n";
}
my $script="plot ";
my $data='';
my $index=0;
my %maintdata;

# Note that "lines" is used if we are also showing maintainer info,
# otherwise we use "linespoints" to make sure points show up for each
# release anyway.
my $style = $no_maintainer ? "linespoints" : "lines";

foreach $package (keys %data) {
	my $oldmaintainer="";
	my $oldversion="";
	# It's crucial the output is sorted by date.
	foreach $i (sort {$data{$package}{$a}[2] <=> $data{$package}{$b}[2]}
		          keys %{$data{$package}}) {
		my $v=$data{$package}{$i}[0];
		$maintainer=$data{$package}{$i}[1];
		$date=$data{$package}{$i}[2];
		$urgency=$data{$package}{$i}[3];

		my $y;

		# If it's got a debian revision, use that as the y coordinate.
		if ($v=~m/(.*)-(.*)/) {
			$y=$2;
			$version=$1;
		}
		else {
			$y=$v;
		}

		# Now make sure the version has no more than 1 decimal point in
		# it. Otherwise, the "set label" command below could fail.
		# This also deals with version numbers of debian-only packages
		# to some extent.
		($y)=$y=~m/(^[^.]*(?:\.[^.]*)?)/;
		
		if (lc($maintainer) ne lc($oldmaintainer)) {
			$oldmaintainer=$maintainer;
		}
		
		my ($sec, $min, $hour, $mday, $mon, $year)=localtime($date);
		my $x=($mon+1)."/$mday/".(1900+$year)." $hour:$min";
		$data.="$x\t$y\n";
		$maintdata{$oldmaintainer}{$urgency}.="$x\t$y\n";
		
		if ($oldversion ne $version && ! $no_version) {
			# Upstream version change. Label it.
			$header.="set label '$version' at '$x',$y left\n";
			$oldversion=$version;
		}
	}
	$data.="\n\n"; # start new dataset
	# Add to plot command.
	$script.="'$data_tmpfile' index $index using 1:3 title '$package' with $style, ";
	$index++;
}

# Add a title.
my $title.='set title "Graphing Debian changelog';
if ($#ARGV > 1) {
	$title.="s";
}
$title.="\"\n";

# Annoyingly, we have to use 2 temp files. I could just send everything to
# gnuplot on stdin, but then the pause -1 doesn't work.
open (DATA, ">$data_tmpfile") || die "$data_tmpfile: $!";
open (SCRIPT, ">$script_tmpfile") || die "$script_tmpfile: $!";
print DATA $data;
if (! $no_maintainer) {
	foreach $maintainer (sort keys %maintdata) {
		foreach $urgency (sort keys %{$maintdata{$maintainer}}) {
			print DATA $maintdata{$maintainer}{$urgency}."\n\n";
			$script.="'$data_tmpfile' index $index using 1:3 title '$maintainer' with points pointsize ".(1.5 * $urgency).", ";
			$index++;
		}	
	}
}
$script=~s/, $/\n/;
$script=qq{
$header
$title
$gnuplot_commands
$script
};
$script.="pause -1 'Press Return to continue.'\n" unless $save_filename;
print SCRIPT $script;
print $script if $verbose;
close SCRIPT;
close DATA;

system "gnuplot",$script_tmpfile;
unlink $script_tmpfile,$data_tmpfile;

# Safely get a temporary file.
sub tmpfile {
	do { 
		$name=tmpnam();
	} until $fh=IO::File->new($name,O_RDWR|O_CREAT|O_EXCL);
	return $name;
}
.TH PLOTCHANGELOG 1 
.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
.\" other parms are allowed: see man(7), man(1)
.SH NAME
plotchangelog \- graph debian changelogs
.SH SYNOPSIS
.B plotchangelog
.I "[options] changelog ..."
.SH "DESCRIPTION"
.BR plotchangelog
is a tool to aid in visualizing a Debian changelog. The changelogs are
graphed with
.BR gnuplot (1)
, with the X axis of the graph denoting time of release and the Y axis
denoting the debian version number of the package. Each individual release
of the package is represented by a point, and the points are color coded to
indicate who released that version of the package. The upstream version
number of the package can also be labeled on the graph.
.PP
Alternativly, the Y axis can be configured to display the size of the 
changelog entry for each new version. Or it can be configured to display
approximatly how many bugs were fixed for each new version.
.PP
Note that if the package is a debian-specific package, the entire package
version will be used for the Y axis. This does not always work perfectly.
.PP
.SH "READING THE GRAPH"
The general outline of a package's
graph is typically a series of peaks, starting at 1, going up to n, and then
returning abruptly to 1. The higher the peaks, the more releases the
maintainer made between new upstream versions of the package. If a package
is debian-only, it's graph will just grow upwards without ever falling.
.PP
If the graph dips below 1, someone made a NMU of the package and upgraded it
to a new upstream version, thus setting the debian version to 0. NMU's in
general appear as fractional points like 1.1, 2.1, etc. A NMU can also be
easily detected by looking at the points that represent which maintainer
uploaded the package -- a solitary point of a different type than the points
before and after it is typically a NMU.
.PP
It's also easy to tell by looking at the points when a package changes
maintainers.
.SH OPTIONS
.TP
.B \-l, \-\-linecount
Instead of using the debian version number as the Y axis, use the number of
lines in the changelog entry for each version.
.TP
.B \-b, \-\-bugcount
Instead of using the debian version number as the Y axis, use the number of
bugs that were closed by each changelog entry. Note that this number is
obtained by searching for "#dddd" in the changelog, and so it may be
innacturate.
.TP
.B \-v, \-\-no-version
Do not show upstream version labels. Useful if the graph gets too crowded.
.TP
.B \-m, \-\-no-maint
Do not differentiate between different maintainers of the package.
.TP
.B \-s file, \-\-save=file
Save the graph to "file" in postscript format instead of immediatly
displaying it.
.TP
.B \-u, \-\-urgency
Use larger points when displaing higher-urgency package uploads.
.TP
.B \-\-verbose
Output the gnuplot script that is fed into gnuplot (for debugging purposes).
.TP
.B \-g "commands", \-\-gnuplot="commands"
This allows you to insert
.BR gnuplot (1)
commands into the gnuplot script that is used to generate the graph. The
commands are placed after all initaliazation but before the final "plot"
command. This can be used to override the default look provided by this
program in arbitrary ways. You can also use things like 
"set terminal png color"
to change the output filetype, which is useful in conjunction with
the -s option.
.TP
.B changelog ...
The changelog files to graph. If multiple files are specified they will all
be display on the same graph. The files may be compressed with gzip. Any
text in them that is not in Debian changelog format will be ignored.
.SH AUTHOR
Joey Hess <joey@kitenet.net>

Reply to: