[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: More problems



On Thu, 19 Jul 2001, Joost Kooij wrote:

> On Thu, Jul 19, 2001 at 08:51:03AM -0500, Steve Langasek wrote:
> > The changelog is used for tracking all of the changes made to the source
> > package over the course of development.  Since this is the Debian changelog
> > rather than an upstream changelog, the majority of changes noted are specific
> > to the shared debian directory, of which there is precisely one for any set of
> > binary packages that are built from a single source package.

> Because of this,

>   #!/bin/sh
>   docs=/usr/share/doc
>   for file in changelog.Debian.gz copyright
>   do
>     uniques=$( find $docs -name $file \
>       | xargs md5sum | cut -d' ' -f1 | sort -u | wc -l )
>     all=$( find $docs -name $file | wc -l )
>     dupct=$( expr 100 - 100 \* $uniques / $all )
>     printf "%-20s %4d files found, %4d unique, %3d%% duplicate\n" \
>             $file: $all        $uniques     $dupct
>   done

> run on a few systems here, gives an average of 20% duplicate copyright
> and changelog.Debian.gz files.  Slightly more duplicate copyright files.

And have you factored in the number of such copyright files which have the
same inode (i.e., the doc directories are symlinks?)

#!/bin/sh

links=$( find /usr/share/doc -maxdepth 1 -type l|wc -l )
dirs=$( find /usr/share/doc -maxdepth 1 -type d|wc -l )
all=$( expr $dirs + $links )
printf "%2d%% of all directories in /usr/share/doc are symlinks\n" \
	$( expr $links \* 100 / $all )

The fact that they are identical doesn't indicate that they waste space on the
system.  The number of duplicate docs on my system is higher than your average
-- 36% vs. 20% -- but 6% can be accounted for by symlinked directories, and I
imagine an even greater number of these duplicates could be handled in this
manner.

Steve Langasek
postmodern programmer



Reply to: