[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Single root filesystem evilness decreasing in 2010? (on workstations)

Robert Brockway wrote:
> [...]
Some filesystems such as XFS & ZFS allow you to effectively set quotas on parts of the filesystem. I think we'll see this becoming more common. This takes away a big part of the need for multiple filesystems.

This is a neat feature indeed. And you're right; apparently, work is beeing done on ext4.


* Specific mount options

This is a good point. I actually hadn't considered this in my list. I'll respond by saying that in general the mount options I use for different filesystems on the same box do not vary much (or at all) in practice.

I've just discovered bindfs [1], a FUSE-based virtual filesystem, which might just answer this problem partially. It looks quite nice, simple and flexible but will obviously won't be able to enable optimizations like noatime. Don't know about the possible overhead though - at that point, one might want to go with "true" access control systems instead.

[1] http://code.google.com/p/bindfs/

> If I want a filesystem marked noatime then I probably want
> all the filesystems marked noatime.  There are exceptions to this of
> course.

Yep, like giving relatime to a filesystem containing mboxes or something like that. But it's true, yes, access times are becoming less and less useful, and I can't think of another real problem (that isn't answered by access control systems) besides that one.

* System software replacement

Easier to reinstall the system if it's on separate volumes than conf and data? Come on..

That's true but the time savings is not terribly great IMHO. The system can be backing up and restoring the dats while the human is off doing other stuff. Saves computer time (cheap) but not human time (expensive).

Either way, there's software to automate and abstract it all. I think the real question is really processing vs storage resources; human resources are the same.

The only reason I saw for doing inflexible volume imaging to do backups is to avoid the filesystem formatting operations as well as files unpacking and copying operations when restoring, which are theorically slower than copying a volume byte-by-byte. "Whatever".

If restore speed is really that critical, it should still be possible to generate an image without including the free space - I know virtualization techs are doing it just fine for most filesystems.

Maybe we misunderstood each other - saw a different problem.

I recommend backing up all system binaries. It's the only way you can guarantee you will get back to the same system you had before the rebuild. This is most important for servers were even small behavioural changes can impact the system in a big way.

So you don't trust Debian stable to be stable?  :-)

See this link for my talk on backups which goes in to this issue further:


All the info in this talk is being transferred to http://www.practicalsysadmin.com.

Thanks a lot; that's a talk full of useful checklists. I'll definitely eat your wiki pages when I have the time.

* Metadata (i-node) table sizes

While this may be a problem now I think it will be less of a problem in the future as some filesystems already allow you to add i-nodes dynamically and this will increasingly be the case.

I'm not sure I follow you, but that sounds cool.  Could you elaborate?

* Block/Volume level operations (dm-crypt, backup, ...)
As said earlier, I don't need a fast backup solution. I already prefer smarter filesystem-based backup systems in general.

As do I. What do you use? If you want to use dump with ext2/3/4 you will need to snapshot for data safety.

Actually I would think dump is a fast but "dumb" solution (much like partimage). And yep, I know, LVM2 is just great for that.

Anyway, my preference isn't based on my own experience so I'm not actually using anything like that, but I'm willing to look at and try fsarchiver and see if it can really beat simple ad-hoc scripts for my needs. Or something heavier, just for fun (Bacula?).

In modern disks the sector layout is hidden. The fastest sectors may be at the beginning of the disk, the end or striped throughout. This is specific to the design of the HDD and it is no longer possible to tell short of doing timing tests[1]. My recommendation is to ignore differences in sector speeds.

[1] I'd love to hear if anyone has found a method but I can't see how they could get through the h/w abstraction.

Good to know, I've actually never seen anything fancy like that (striped throughout). I'll test my disks to see how I can make the best out of them anyway - but I agree with you in the case one wants to setup a portable, deployable system.

LVM won't theoritically guarantee the physical position of the logical volumes anyway. And I'll need it if I do any partitioning.

So now it is abstracted (at least) twice :)

Hehe, yeah.  I'm glad I'm not into forensics.  What a beautiful mess.

* Swap special-case

Under Linux 2.6 kernels a swap file is as efficient as a swap partition. The only real advantage of a swap partition is to allow suspend to disk (on a laptop).

Really? That's too bad. Can't think of any real obstacle, I hope this limitation will be lifted.

There are however some neat dymanic swap allocation projects out there that would help me not lose these gigabytes I never seem to be using (at all). I

I wouldn't touch these if they in any way impacted performance. Disk is cheap. Give yourself 2GB swap.

Yup, they currently do (AFAIK), it takes a little bit of time to set them up. Let's say it's cool to have in addition to a fixed swap space, as an extra safety measure.

figured, with all this RAM I could think of the swapping space as a mere rescue space to prevent OOM rampages - and nothing else. In *my* case, even buffers and cached pages never get to be pushed on disk after weeks without

Ah but they are. Cache pages may be clean or dirty. Your disk cache may be full of clean cache pages, which is just fine.

Am I interpreting the output of free(1) the wrong way?

  cay:~$ free -o
               total       used       free     shared    buffers     cached
  Mem:       3116748    3029124      87624          0     721500    1548628
  Swap:      3145720        800    3144920

To me, looks like only 800KiB are actually swapped (uptime 11d) - don't know how I can see what type of data it is. Is that irrelevant?

RAM is not even fully used, so it doesn't surprise me at first.

rebooting. I'm just OK with my three gigs. The 1:1 mem:swap rule has got to be wasting space here, hasn't it?

Absolutely.  This page has my thoughts on this topic:


Thanks in advance for your help. I hope I could make you think twice about it too or maybe provide people with other needs with a little checklist to better design their layout.

Thanks for the great checklist.

Thanks for taking the time to look at this, and for the links to your pages (these are useful).




Reply to: