Re: RFS: FSlint - File System lint
Justin Pryzby wrote:
>On Mon, Feb 27, 2006 at 04:47:31PM -0500, pryzbyj wrote:
>
>>On Mon, Feb 27, 2006 at 05:04:30PM +0000, P??draig Brady wrote:
>>
>>>Hi,
>>>
>>>I've been maintaining FSlint for a few years now
>>>and it has proved quite popular. There have even
>>>been (buggy) thirdparty debian packages floating around.
>>>In the latest version (2.14) I have created a debian package,
>>>and it would be create if someone could sponsor this
>>>package for inclusion in debian.
>
>This package is really quite neat. I've read through much of the
>code, (lots of pretty-small bashscripts), and I must say that I'm
>inspired. I especially like this "find duplicates" pipeline (my own
>implementation here):
Cheers. Hopefully we'll get 2.15 into debian soon.
I'm working on your comments and also I have a bug fix
I'd like to get done.
>
> find . -type f -print0 |xargs -r0 md5sum |sort |sed -re
's/(\S*)\s*(\S*)/\2\t\1/' |uniq -df1 --all-=sep |sed -e 's/\t\S*$//;'
Note throwing away unique file sizes first is a huge optimization.
I also sort by inodes (or path is nearly as good),
which reduces disk seeking a lot.
>
>Does anyone know a prettier way of switching the md5sum output than
>this sedscript?? (Has to deal with special pathnames, of course!)
My method is more robust BTW (try path names with spaces)
sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/'
Note I think uniq will get key support (like sort) at some stage.
Also debian has a specific patch for -W to compare only
the first N fields. However this is not standard and
has just been removed I understand.
>
>Or a way of optimizing the files removed? (Probably to maximize the
>level of directories which have no normal files anywhere within them
>after removal).
Never thought of that. Hmm...
Pádraig.
Reply to: