Re: the correct way to read a big directory? Mutt?

To: debian-user@lists.debian.org
Subject: Re: the correct way to read a big directory? Mutt?
From: Vincent Lefevre <vincent@vinc17.net>
Date: Sat, 25 Apr 2015 02:39:07 +0200
Message-id: <[🔎] 20150425003907.GB12262@xvii.vinc17.org>
Mail-followup-to: debian-user@lists.debian.org
In-reply-to: <[🔎] 20150424213951.GA13410@alum>
References: <[🔎] 20150424135238.GA12147@ypig.lip.ens-lyon.fr> <[🔎] 20150424213951.GA13410@alum>

On 2015-04-24 16:39:51 -0500, David Wright wrote:
[...]
>  And another: it's probably faster to slurp bigger chunks of each file
>  (with an intelligent guess of the best buffer size) and use a fast
>  search for \nMessage-ID rather than reading and checking line by line."

This is not that simple. I want my script to be very reliable.
In particular, if there is a message without a Message-ID and
with "\nMessage-ID" in the body, I want to detect it. This kind
of thing really happens in practice (though this is rare), e.g.
due to some buggy mail software that breaks the headers and put
a part of them in the body. I also want to check the format of
the headers and possible duplicate Message-ID. What my script
really does is:

    while (<FILE>)
      {
        /^[\t ]/ and next;
        /^\S+:/ || (!$from++ && /^From /)
          or die "$proc: bad message format ($file)";
        /^Message-ID:\s+(<\S+>)( \(added by .*\))?$/i or next;
        defined $files{$1}
          and die "$proc: duplicate message-id $1 ($files{$1} and $file)\n";
        $files{$1} = $file;
        last;
      }

[...]
> And should you read the whole directory by specifying <directory-name>/*,
> you lose the benefit and thrash the disk again.

With zsh, I often do things like: grep ... <directory-name>/**/*.c

One can choose to sort the result, but zsh doesn't support sorting
by inode number. I've sent a feature request.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply to:

Follow-Ups:
- Re: the correct way to read a big directory? Mutt?
  - From: Nicolas George <george@nsup.org>

References:
- the correct way to read a big directory? Mutt?
  - From: Vincent Lefevre <vincent@vinc17.net>
- Re: the correct way to read a big directory? Mutt?
  - From: David Wright <deblis@lionunicorn.co.uk>

Prev by Date: Re: can't automatically launch lxde
Next by Date: Re: reading an empty directory after reboot is very slow
Previous by thread: Re: the correct way to read a big directory? Mutt?
Next by thread: Re: the correct way to read a big directory? Mutt?
Index(es):
- Date
- Thread