Re: RFS: FSlint - File System lint

To: Justin Pryzby <justinpryzby@users.sourceforge.net>
Cc: debian-mentors@lists.debian.org
Subject: Re: RFS: FSlint - File System lint
From: Pádraig Brady <P@draigBrady.com>
Date: Mon, 06 Mar 2006 12:17:48 +0000
Message-id: <[🔎] 440C286C.60107@draigBrady.com>
In-reply-to: <[🔎] 20060304175221.GA6825@andromeda>
References: <4403311E.8060205@draigBrady.com> <20060227214731.GA31864@andromeda> <[🔎] 20060304175221.GA6825@andromeda>

Justin Pryzby wrote:

>On Mon, Feb 27, 2006 at 04:47:31PM -0500, pryzbyj wrote:
>
>>On Mon, Feb 27, 2006 at 05:04:30PM +0000, P??draig Brady wrote:
>>
>>>Hi,
>>>
>>>I've been maintaining FSlint for a few years now
>>>and it has proved quite popular. There have even
>>>been (buggy) thirdparty debian packages floating around.
>>>In the latest version (2.14) I have created a debian package,
>>>and it would be create if someone could sponsor this
>>>package for inclusion in debian.
>
>This package is really quite neat.  I've read through much of the
>code, (lots of pretty-small bashscripts), and I must say that I'm
>inspired.  I especially like this "find duplicates" pipeline (my own
>implementation here):

Cheers. Hopefully we'll get 2.15 into debian soon.
I'm working on your comments and also I have a bug fix
I'd like to get done.

>
>  find . -type f -print0 |xargs -r0 md5sum |sort |sed -re
's/(\S*)\s*(\S*)/\2\t\1/' |uniq -df1 --all-=sep |sed -e 's/\t\S*$//;'

Note throwing away unique file sizes first is a huge optimization.
I also sort by inodes (or path is nearly as good),
which reduces disk seeking a lot.

>
>Does anyone know a prettier way of switching the md5sum output than
>this sedscript??  (Has to deal with special pathnames, of course!)

My method is more robust BTW (try path names with spaces)
sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/'
Note I think uniq will get key support (like sort) at some stage.
Also debian has a specific patch for -W to compare only
the first N fields. However this is not standard and
has just been removed I understand.

>
>Or a way of optimizing the files removed?  (Probably to maximize the
>level of directories which have no normal files anywhere within them
>after removal).

Never thought of that. Hmm...

Pádraig.

Reply to:

References:
- Re: RFS: FSlint - File System lint
  - From: Justin Pryzby <justinpryzby@users.sourceforge.net>

Prev by Date: Re: new and some about me
Next by Date: Re: new and some about me
Previous by thread: Re: RFS: FSlint - File System lint
Next by thread: Program for creating and managing APT repositories
Index(es):
- Date
- Thread