Re: More problems
On Thu, 19 Jul 2001, Joost Kooij wrote:
> On Thu, Jul 19, 2001 at 08:51:03AM -0500, Steve Langasek wrote:
> > The changelog is used for tracking all of the changes made to the source
> > package over the course of development. Since this is the Debian changelog
> > rather than an upstream changelog, the majority of changes noted are specific
> > to the shared debian directory, of which there is precisely one for any set of
> > binary packages that are built from a single source package.
> Because of this,
> #!/bin/sh
> docs=/usr/share/doc
> for file in changelog.Debian.gz copyright
> do
> uniques=$( find $docs -name $file \
> | xargs md5sum | cut -d' ' -f1 | sort -u | wc -l )
> all=$( find $docs -name $file | wc -l )
> dupct=$( expr 100 - 100 \* $uniques / $all )
> printf "%-20s %4d files found, %4d unique, %3d%% duplicate\n" \
> $file: $all $uniques $dupct
> done
> run on a few systems here, gives an average of 20% duplicate copyright
> and changelog.Debian.gz files. Slightly more duplicate copyright files.
And have you factored in the number of such copyright files which have the
same inode (i.e., the doc directories are symlinks?)
#!/bin/sh
links=$( find /usr/share/doc -maxdepth 1 -type l|wc -l )
dirs=$( find /usr/share/doc -maxdepth 1 -type d|wc -l )
all=$( expr $dirs + $links )
printf "%2d%% of all directories in /usr/share/doc are symlinks\n" \
$( expr $links \* 100 / $all )
The fact that they are identical doesn't indicate that they waste space on the
system. The number of duplicate docs on my system is higher than your average
-- 36% vs. 20% -- but 6% can be accounted for by symlinked directories, and I
imagine an even greater number of these duplicates could be handled in this
manner.
Steve Langasek
postmodern programmer
Reply to: