[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: /tmp as tmpfs and consequence for imaging software



On Sun, Nov 13, 2011 at 09:40:42AM +0100, Bastien ROUCARIES wrote:
> On Sat, Nov 12, 2011 at 11:25 PM, Josselin Mouette <joss@debian.org> wrote:
> > Le samedi 12 novembre 2011 à 23:12 +0100, Samuel Thibault a écrit :
> >> Adam Borowski, le Sat 12 Nov 2011 23:08:08 +0100, a écrit :
> >> > You need to increase the swap size by the amount you'd use for /tmp.
> >>
> >> Well, the idea of such case is precisely to *not* use swap, but real
> >> disks. Such software already know how to manage its memory and
> >> disk-backed memory (thusly stored in /tmp)
> >
> > Practically speaking, the only significant difference is that files are
> > not forced to disk as early. Otherwise, if you have a large enough swap,
> > pages of a file on a tmpfs that are not used enough will be swapped. And
> > pages of a file on a regular filesystem that are used enough will be
> > kept in the buffer cache.
> 
> No it is not true. Science and imaging software are better to use true
> disk baked file. For instance, if I want ot invert a big matrix they
> are pretty good algorithm that force only some part of the file to be
> keep on disk. They known better than kernel when to put somepart on
> the data on the slow disk.

I don't agree one bit.  I've just spent a huge part of my PhD looking
at multi-gigabyte image files using various software including ImageJ.
I put the working data onto tmpfs, which sped things up significantly.
[16GiB swap, 8GiB core, 2GiB tmpfs on /tmp, 3TiB RAID for storage].
Obviously, I avoided swapping by having sufficient memory.

For handling huge files, you could just use mmap(), together with
madvise to instruct the kernel what the usage pattern will be like,
and how best to cache the read pages.  For matrix multiplication,
you would probably want MADV_SEQUENTIAL.  Without mmap, you could
use posix_fadvise with POSIX_FADV_SEQUENTIAL|POSIX_FADV_NOREUSE.

Obviously if you're on tmpfs, you don't need to care about caching
in the block layer since it's always going to be in the page cache
(modulo swapping), which means you can just mmap it and it's right
there in memory and you don't have duplicate copies on disc and in
your working set, and in the buffer cache.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.


Reply to: