[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RFS: FSlint - File System lint

Justin Pryzby wrote:

>On Mon, Feb 27, 2006 at 04:47:31PM -0500, pryzbyj wrote:
>>On Mon, Feb 27, 2006 at 05:04:30PM +0000, P??draig Brady wrote:
>>>I've been maintaining FSlint for a few years now
>>>and it has proved quite popular. There have even
>>>been (buggy) thirdparty debian packages floating around.
>>>In the latest version (2.14) I have created a debian package,
>>>and it would be create if someone could sponsor this
>>>package for inclusion in debian.
>This package is really quite neat.  I've read through much of the
>code, (lots of pretty-small bashscripts), and I must say that I'm
>inspired.  I especially like this "find duplicates" pipeline (my own
>implementation here):

Cheers. Hopefully we'll get 2.15 into debian soon.
I'm working on your comments and also I have a bug fix
I'd like to get done.

>  find . -type f -print0 |xargs -r0 md5sum |sort |sed -re
's/(\S*)\s*(\S*)/\2\t\1/' |uniq -df1 --all-=sep |sed -e 's/\t\S*$//;'

Note throwing away unique file sizes first is a huge optimization.
I also sort by inodes (or path is nearly as good),
which reduces disk seeking a lot.

>Does anyone know a prettier way of switching the md5sum output than
>this sedscript??  (Has to deal with special pathnames, of course!)

My method is more robust BTW (try path names with spaces)
sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/'
Note I think uniq will get key support (like sort) at some stage.
Also debian has a specific patch for -W to compare only
the first N fields. However this is not standard and
has just been removed I understand.

>Or a way of optimizing the files removed?  (Probably to maximize the
>level of directories which have no normal files anywhere within them
>after removal).

Never thought of that. Hmm...


Reply to: