[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reading an empty directory after reboot is very slow



Quoting Vincent Lefevre (vincent@vinc17.net):
> On 2015-04-15 13:41:20 -0600, Bob Proulx wrote:
> > Vincent Lefevre wrote:
> > > I also notice slowness with a large maildir directory:
> > > 
> > > drwx------ 2 vlefevre vlefevre 8409088 2015-03-24 14:04:33 Mail/oldarc/cur/
> > > 
> > > In this one, the files are real (145400 files), but I have a Perl
> > > script that basically reads the headers and it takes a lot of time
> > > (several dozens of minutes) after a reboot or dropping the caches
> > > as you suggested above. With a second run of this script, it just
> > > takes 8 seconds.
> [...]
> > It would also be interesting to convert the Maildir with 145400 files
> > to a compressed mbox format single file.  (That will convert "^From "
> > lines if that is a concern for you.)  I expect that if you were to
> > modify your perl script program to read the compressed mbox file and
> > do the same task that it might be faster!  It would remove the
> > overhead time needed to open each of those 145400 files.
> 
> Possibly, but individual modifications would take much more time than
> with Maildir (such modifications, consisting in retagging, occur from
> time to time).

I take it these are real emails being read etc with an email client
(like, say, mutt) at various times, rather than a dead archive of old
emails that you just happen to keep processing from time to time.

For a start, if you put them in a single mbox, then all 145k messages
are locked whenever any one of them is. Maildir doesn't have that
problem.

When you process them, do you avoid updating the atime on the files:
that can involve a lot of writing to the directory.

> > It all depends upon the distribution of data size of the body of the
> > messages since then it would need to read and skip the message
> > bodies.
> 
> With an uncompressed mbox file, using the Content-Length, it could be
> faster, but there's still the problem with individual changes.

That could be mitigated if you use that mbox extension which puts a
fixed-length header on each email for metadata (as long as that's the
kind of metadata you're operating on).

> > But let's say that all of the bodies were small less than 50k then I
> > expect that converging them to a single mbox file would make them much
> > faster than the individual files.
> 
> 6.5 KB in average.
> 
> > Also compressing the file reduces the amount of I/O needed to pull
> > the data into memory. With today's fast cpus decompression is faster
> > than disk I/O and reading a compressed file and decompressing it is
> > usually faster in my experience. Every case is individually
> > different however. If you run that experiment I would be interested
> > in knowning the result.
> 
> But recompressing would be very slow.
> 
> I wonder whether there exists some specific FS that would make maildir
> access very fast and whether using it on a disk image that could be
> loop-mounted would be interesting.

Have you considered running a local IMAP server to handle this (and
any other) maildir? Handling those volumes of email must be bread and
butter to hosting services. I assume such servers build persistent
caches of the emails rather than just depending on the filesystem's.

It'll probably take some reading up, though, if you want it to do
whatever it is you do with your perl script.

Cheers,
David.


Reply to: