Re: More problems

To: <debian-devel@lists.debian.org>
Subject: Re: More problems
From: Steve Langasek <vorlon@netexpress.net>
Date: Thu, 19 Jul 2001 10:56:11 -0500 (CDT)
Message-id: <[🔎] Pine.LNX.4.30.0107191038370.22136-100000@tennyson.netexpress.net>
In-reply-to: <[🔎] 20010719173516.L17310@topaz.mdcc.cx>

On Thu, 19 Jul 2001, Joost Kooij wrote:

> On Thu, Jul 19, 2001 at 08:51:03AM -0500, Steve Langasek wrote:
> > The changelog is used for tracking all of the changes made to the source
> > package over the course of development.  Since this is the Debian changelog
> > rather than an upstream changelog, the majority of changes noted are specific
> > to the shared debian directory, of which there is precisely one for any set of
> > binary packages that are built from a single source package.

> Because of this,

>   #!/bin/sh
>   docs=/usr/share/doc
>   for file in changelog.Debian.gz copyright
>   do
>     uniques=$( find $docs -name $file \
>       | xargs md5sum | cut -d' ' -f1 | sort -u | wc -l )
>     all=$( find $docs -name $file | wc -l )
>     dupct=$( expr 100 - 100 \* $uniques / $all )
>     printf "%-20s %4d files found, %4d unique, %3d%% duplicate\n" \
>             $file: $all        $uniques     $dupct
>   done

> run on a few systems here, gives an average of 20% duplicate copyright
> and changelog.Debian.gz files.  Slightly more duplicate copyright files.

And have you factored in the number of such copyright files which have the
same inode (i.e., the doc directories are symlinks?)

#!/bin/sh

links=$( find /usr/share/doc -maxdepth 1 -type l|wc -l )
dirs=$( find /usr/share/doc -maxdepth 1 -type d|wc -l )
all=$( expr $dirs + $links )
printf "%2d%% of all directories in /usr/share/doc are symlinks\n" \
	$( expr $links \* 100 / $all )

The fact that they are identical doesn't indicate that they waste space on the
system.  The number of duplicate docs on my system is higher than your average
-- 36% vs. 20% -- but 6% can be accounted for by symlinked directories, and I
imagine an even greater number of these duplicates could be handled in this
manner.

Steve Langasek
postmodern programmer

Reply to:

References:
- Re: More problems
  - From: joost@topaz.mdcc.cx (Joost Kooij)

Prev by Date: Re: format of Maintainer: field
Next by Date: Re: ITP: keynote -- Decentralized Trust-Management system
Previous by thread: Re: More problems
Next by thread: Re: More problems
Index(es):
- Date
- Thread