Re: Single root filesystem evilness decreasing in 2010? (on workstations) [LONG]
On Thu, 4 Mar 2010, thib wrote:
If restore speed is really that critical, it should still be possible to
generate an image without including the free space - I know virtualization
techs are doing it just fine for most filesystems.
Maybe we misunderstood each other - saw a different problem.
Possibly. I didn't mean to suggest that dd was a good way to backup. I
think it is a terrible way to backup. I was talking about dump
utilities. I started using dump on Solaris in the mid 90s and really like
the approach to backing up that dump utilities offer. On Linux I use xfs
a lot and backup with xfsdump in many cases.
 A long time ago I used to use it to backup MS-Windows systems from
Linux but disks grew so much it became infeasable.
I recommend backing up all system binaries. It's the only way you can
guarantee you will get back to the same system you had before the rebuild.
This is most important for servers were even small behavioural changes can
impact the system in a big way.
So you don't trust Debian stable to be stable? :-)
Actually I'd say Debian is best-of-breed when it comes to backporting
security patches to retain consistent functionality. Having said that,
system binaries represents an ever reducing proportion of total data on a
computer system. When I first started with Linux the OS took up about 80%
of the available disk space that I had. Today I'd be generous if I said
it took up 2%. So even if there is an alternative, backing them up now is
hardly onerous and improves the chances of a successful disaster recovery.
I cover this more in the backup talk.
Thanks a lot; that's a talk full of useful checklists. I'll definitely eat
your wiki pages when I have the time.
Great. I'm gradually adding more and more info to the site.
While this may be a problem now I think it will be less of a problem in the
future as some filesystems already allow you to add i-nodes dynamically and
this will increasingly be the case.
I'm not sure I follow you, but that sounds cool. Could you elaborate?
Sure. GPFS (a commercial filesystem available for Linux) allows for the
addition of i-nodes dynamically. We can expect more and more dynamic
changes to filesystems as the science advances.
I once nearly ran out of i-nodes on a 20TB GPFS filesystem on a SAN.
Being able to dynamically add i-nodes was a huge relief. I didn't even
need to unmount the filesystem.
Anyway, my preference isn't based on my own experience so I'm not actually
using anything like that, but I'm willing to look at and try fsarchiver and
see if it can really beat simple ad-hoc scripts for my needs. Or something
heavier, just for fun (Bacula?).
I'm fairly particular about backup systems. I think most people who
design backup systems have never done a DR in the real world.
I seem to end having to do at least one large scale DR per year. I've
done two in the last month. I've done several DRs in the multi-TB range.
Virtually every DR I've done has a hardware fault as the underlying cause.
In several cases multiple (supposedly independent) systems failed
The core of any DR plan is the KISS principal. There's a good chance that
the poor guy doing the DR is doing it at 3am so the instructions need to
be simple to reduce the chance of errors.
If the backup solution requires me to have a working DB just to extract
data or wants me to install an OS and the app before I can get rolling
then I view it with extreme suspicion.
And for those people who think that off-site/off-line backups aren't
needed anymore because you can just replicate data across the network,
I'll give you 5 minutes to find the floor in that plan :)
Ah but they are. Cache pages may be clean or dirty. Your disk cache may
be full of clean cache pages, which is just fine.
Am I interpreting the output of free(1) the wrong way?
Sort of :)
Free is telling you the total memory in disk cache. Any given page in the
cache may be 'dirty' or 'clean'. A dirty page has not yet been written to
disk. New pages start out dirty. Within about 30 seconds (varies by
filesystem and other factors) the page is written to disk. The page in
the cache is now clean.
Unless your system is writing heavily most pages in the cache are likely
to be clean.
The difference is that clean pages can be dumped instantly to reclaim the
memory. Dirty pages must be flushed to disk before they can be
reclaimed. Using clean pages allows fast read access from the cache
without the risk of not having committed the data. I describe this as
'having your cake and eating it too'.
More info can be found here:
 Paraphrase of English language saying.
cay:~$ free -o
total used free shared buffers cached
Mem: 3116748 3029124 87624 0 721500 1548628
Swap: 3145720 800 3144920
To me, looks like only 800KiB are actually swapped (uptime 11d) - don't know
how I can see what type of data it is. Is that irrelevant?
I consider it irrelevant as a sysadmin. I'm purely interested in whether
the system has sufficient swap or is swapping too much.
I tried to change the world but they had a no-return policy