Re: the correct way to read a big directory? Mutt?
On 2015-04-24 16:39:51 -0500, David Wright wrote:
[...]
> And another: it's probably faster to slurp bigger chunks of each file
> (with an intelligent guess of the best buffer size) and use a fast
> search for \nMessage-ID rather than reading and checking line by line."
This is not that simple. I want my script to be very reliable.
In particular, if there is a message without a Message-ID and
with "\nMessage-ID" in the body, I want to detect it. This kind
of thing really happens in practice (though this is rare), e.g.
due to some buggy mail software that breaks the headers and put
a part of them in the body. I also want to check the format of
the headers and possible duplicate Message-ID. What my script
really does is:
while (<FILE>)
{
/^[\t ]/ and next;
/^\S+:/ || (!$from++ && /^From /)
or die "$proc: bad message format ($file)";
/^Message-ID:\s+(<\S+>)( \(added by .*\))?$/i or next;
defined $files{$1}
and die "$proc: duplicate message-id $1 ($files{$1} and $file)\n";
$files{$1} = $file;
last;
}
[...]
> And should you read the whole directory by specifying <directory-name>/*,
> you lose the benefit and thrash the disk again.
With zsh, I often do things like: grep ... <directory-name>/**/*.c
One can choose to sort the result, but zsh doesn't support sorting
by inode number. I've sent a feature request.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Reply to: