[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: backup directory/file exclusion pattern list for borgbackup



On Sat, Sep 25, 2021 at 8:04 PM Kushal Kumaran <kushal@locationd.net> wrote:
>
> On Sat, Sep 25 2021 at 06:24:12 PM, Default User <hunguponcontent@gmail.com> wrote:
> > Hello!
> >
> > I want to try using borgbackup to do backups of my (only) user directory:
> > /home/debian-user
> >
> > I just want to do so using Vorta, a GUI for borgbackup.
> >
> > But I just need a good, general list of directory and file type
> > exclusions that I can just cut and paste into the Exclude Patterns
> > window in Vorta.  Something like the default list of exclusions that
> > appears by default in the Backintime backup program.
> >
>
> I don't understand what a general list of exclusions would look like.
> Do you have examples of what backintime excludes by default?  My own
> borgbackup runs backup everything on disk; I don't feel the need to
> exclude anything.
>
> > Note 1: borgbackup uses a matching pattern called "Fnmatch" with which
> > I am not familiar, and don't want to learn by trial and error, losing
> > data in the process.  Which is why I am looking for a "drop-in" basic
> > exclude list.
> >
>
> Run "borg help patterns" to see explanation of how borgbackup deals with
> patterns.  It has this to say about fnmatch:
>
>     This is the default style for --exclude and --exclude-from.  These
>     patterns use a variant of shell pattern syntax, with '*' matching
>     any number of characters, '?' matching any single character, '[...]'
>     matching any single character specified, including ranges, and
>     '[!...]'  matching any character not specified. For the purpose of
>     these patterns, the path separator (backslash for Windows and '/' on
>     other systems) is not treated specially. Wrap meta-characters in
>     brackets for a literal match (i.e. [?] to match the literal
>     character ?). For a path to match a pattern, the full path must
>     match, or it must match from the start of the full path to just
>     before a path separator. Except for the root path, paths will never
>     end in the path separator when matching is attempted.  Thus, if a
>     given pattern ends in a path separator, a '*' is appended before
>     matching is attempted.
>
> > Note 2: I am not intending to use borgbackup to back up the whole
> > system; just /home/debian-user and its subdirectories.  I am using
> > timeshift to back up the rest of the system.  Timeshift uses a huge
> > amount of disk space, but it . . .  works.
> >
>
> I don't know how timeshift stores backups.  borg uses deduplicated
> storage that avoids storing identical data multiple times.  My own borg
> backups results in ~1G of new data every week (and about the same amount
> being deleted from expiring backups).  There is no significant increase
> in repository size week-over-week.  That obviously would not be the same
> for everyone, but if you're bothered by the amount of disk space used
> you can try it out.
>
> > Note 3: I am aware that some use backintime to back up user data, and
> > I have tried it myself.  But it just seems to have some "problems".
> > For example, the built-in "diff" utility does not seem to do anything.
> > It seems old and gives the impression of not being heavily developed.
> > The documentation is "adequate" but mediocre. And what really grinds
> > my gears about backintime, a problem apparently known as far back as
> > 2014:
> >
> > "Warning: A recent security audit revealed several possible attack
> > vectors for EncFs.
> >
> >>From https://defuse.ca/audits/encfs.htm:
> >
> > EncFS is probably safe as long as the adversary only gets one copy of
> > the ciphertext and nothing more. EncFS is not safe if the adversary
> > has the opportunity to see two or more snapshots of the ciphertext at
> > different times. EncFS attempts to protect files from malicious
> > modification, but there are serious problems with this feature.
> >
> > This might be a problem with Back In Time snapshots."
> >
> > Gee . . .  think so?
>
> That report talks about issues with encfs design.  There is nothing
> backintime can do to fix those.
>
> borg can encrypt its backup images, and it recommendeds enabling that.
> So an adversary would not get access to the encfs ciphertext directly.
> They could get access to borg ciphertext instead, which may or may not
> be vulnerable to the same problems.  AFAIK there hasn't been a security
> audit of borgbackup itself.  The page at
> https://borgbackup.readthedocs.io/en/stable/internals/security.html#borgcrypto
> describes the design of borg security.
>
> --
> regards,
> kushal
>


Hi, Kushal.

In Vorta, under the "Sources" tab, there is an area (window) for input
into which you can type or paste text, such as:

**/.cache

to denote exclusions, that is, things you do not want to back up.
This is from /home/debian_user/.config/backintime/config:

. . .
profile1.snapshots.exclude.1.value=.gvfs
profile1.snapshots.exclude.2.value=.cache/*
profile1.snapshots.exclude.3.value=.thumbnails*
profile1.snapshots.exclude.4.value=.local/share/[Tt]rash*
profile1.snapshots.exclude.5.value=*.backup*
profile1.snapshots.exclude.6.value=*~
profile1.snapshots.exclude.7.value=.dropbox*
profile1.snapshots.exclude.8.value=/proc/*
profile1.snapshots.exclude.9.value=/sys/*
profile1.snapshots.exclude.10.value=/dev/*
profile1.snapshots.exclude.11.value=/run/*
profile1.snapshots.exclude.12.value=/etc/mtab
profile1.snapshots.exclude.13.value=/var/cache/apt/archives/*.deb
profile1.snapshots.exclude.14.value=lost+found/*
profile1.snapshots.exclude.15.value=/tmp/*
profile1.snapshots.exclude.16.value=/var/tmp/*
profile1.snapshots.exclude.17.value=/var/backups/*
profile1.snapshots.exclude.18.value=.Private
. . .

Of course that is expressed in backintime's own configuration
"language", and would probably need to be translated into borgbackup's
equivalent "language".

Something like that is what I was sort of looking for. And it is not
just for efficiency. Consider this, from the Arch wiki article on
rsync:

----------

"Run the following command as root to make sure that rsync can access
all system files and preserve the ownership:

# rsync -aAXHv --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found"}
/ /path/to/backup

By using the -aAX set of options, the files are transferred in archive
mode which ensures that symbolic links, devices, permissions,
ownerships, modification times, ACLs, and extended attributes are
preserved, assuming that the target file system supports the feature.
The option -H preserves hard links, but uses more memory.

The --exclude option causes files that match the given patterns to be
excluded. The directories /dev, /proc, /sys, /tmp, and /run are
included in the above command, but the contents of those directories
are excluded. This is because they are populated on boot, but the
directories themselves are not created. /lost+found is
filesystem-specific. The command above depends on brace expansion
available in both the bash and zsh shells. When using a different
shell, --exclude patterns should be repeated manually. Quoting the
exclude patterns will avoid expansion by the shell, which is
necessary, for example, when backing up over SSH. Ending the excluded
paths with * ensures that the directories themselves are created if
they do not already exist.

Note:

If you plan on backing up your system somewhere other than /mnt or
/media, do not forget to add it to the list of exclude patterns to
avoid an infinite loop.
If there are any bind mounts in the system, they should be excluded as
well so that the bind mounted contents is copied only once.
If you use a swap file, make sure to exclude it as well.
Consider if you want to backup the /home/ directory. If it contains
your data it might be considerably larger than the system. Otherwise
consider excluding unimportant sub-directories such as
/home/*/.thumbnails/*, /home/*/.cache/mozilla/*,
/home/*/.cache/chromium/*, and /home/*/.local/share/Trash/*, depending
on software installed on the system.
If GVFS is installed, /home/*/.gvfs must be excluded to prevent rsync errors.
If Dhcpcd ≥ 9.0.0 is installed, exclude the /var/lib/dhcpcd/*
directory as it mounts several system directories as sub-directories
there.

You may want to include additional rsync options, or remove some, such
as the following. See rsync(1) for the full list.

If you run on a system with very low memory, consider removing -H
option; however, it should be no problem on most modern machines.
There can be many hard links on the file system depending on the
software used (e.g. if you are using Flatpak). Many hard links reside
under the /usr/ directory.
You may want to add rsync's --delete option if you are running this
multiple times to the same backup directory. In this case make sure
that the source path does not end with /*, or this option will only
have effect on the files inside the subdirectories of the source
directory, but it will have no effect on the files residing directly
inside the source directory.
If you use any sparse files, such as virtual disks, Docker images and
similar, you should add the -S option.
The --numeric-ids option will disable mapping of user and group names;
instead, numeric group and user IDs will be transfered. This is useful
when backing up over SSH or when using a live system to backup
different system disk.
Choosing --info=progress2 option instead of -v will show the overall
progress info and transfer speed instead of the list of files being
transferred.
To avoid crossing a filesystem boundary when recursing, add the option
-x/--one-file-system. This will prevent backing up any mount point in
the hierarchy."

----------

And that isn't even (afaik) Fnmatch!
(BTW, I have read what you referenced as ' Run "borg help patterns" '.
I'm afraid it wasn't very helpful to me.)

Timeshift (system files only) currently takes up about 10Gb backing up
a relatively lean system.

Borg takes up about 4.2 Gb of user data only.

Backintime uses about 4.4Gb to back up the same user data.  It seems
to be just a  fancy GUI, that appears to use rsync as a backend, to
take "snapshots".

I shall take for granted that backintime developers do not code encfs.
Fine.  But after 7 years (at least), why haven't they replaced encfs
with a "safer" encryption scheme, or at least just removed it and
simply not replaced it at all?  IMHO, either option would seem far
better than the status quo.

I'm sure someone is saying, "Well, you don't HAVE TO use the built encryption."
Believe me, I don't.  And won't.

As you noted, borg seems to take encryption much more seriously.
Which I think is a good thing, as I consider data integrity and
security to be very important.


Reply to: