Re: Backup from huge Maildirs
At 01:01 AM 9/4/2007 +0200, mlists wrote:
>I'm rsync'ing Maildirs from location1 to location2.
>On day1 all the new mails (which are in the "new" subdir) are transfered
>from loc1 to loc2. On day2 all these mails have been read and they're
>deleted from the backup (because they are not in "new" anymore) and
>re-transfered from loc1 to loc2, as they are now in subdir "cur".
>what I would like is for rsync to be smart and "see" that it's the same
>mail, only it has been moved from "new" to "cur".
I think there's a few things u can try. The "right" way to do it is to have
a maildir program that is backup aware and does it's own backups. That
would take care of the problem of moved files being retransfered.
If all the files in the hierarchy have unique file names which are merely
moved between directories (mail folders) u can flatten the entire directory
tree to take care of rsync's "I think it's a new file" problem. Then on the
destination side u can recreate the directory structure with a simple script.
U can use rsync's --fuzzy option which makes it search for similar files to
use as a basis for it's diff algorithm.
I think a better way to do it is the snapshot method mentioned earlier. If
stat()'ing all the files and directories is the bottleneck then this will
help greatly. The poor man's way to do it is to just dd the entire
partition and pipe it over to the backup location. It can be piped to
another partition or be split up into appropriate files.
Another possibility if the stat bottleneck is the problem is to use sqlfs.
I don't know the current state of development of it but from what I've heard
it's wicked fast. Basically what it does is put ur filesystem into a MySQL
(or whatever) database and that let's u take advantage of the database's
superior indexing abilities. It also let's u take easy snapshots.
REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--
"...ne cede malis"