The fnmatch module in Python's standard library implements a pattern
format for paths which is similar to shell patterns. However, “*”
matches any character including path separators. This newly introduced
pattern syntax with the selector “sh” no longer matches the path
separator with “*”. Instead “**/” can be used to match zero or more
directory levels.
the items metadata stream is usually not that big (compared to the file content data) -
it is just file and dir names and other metadata.
if we use too rough granularity there (and big minimum chunk size), we usually will get no deduplication.
The existing option to exclude files and directories, “--exclude”, is
implemented using fnmatch[1]. fnmatch matches the slash (“/”) with “*”
and thus makes it impossible to write patterns where a directory with
a given name should be excluded at a specific depth in the directory
hierarchy, but not anywhere else. Consider this structure:
home/
home/aaa
home/aaa/.thumbnails
home/user
home/user/img
home/user/img/.thumbnails
fnmatch incorrectly excludes “home/user/img/.thumbnails” with a pattern
of “home/*/.thumbnails” when the intention is to exclude “.thumbnails”
in all home directories while retaining directories with the same name
in all other locations.
With this change regular expressions are introduced as an additional
pattern syntax. The syntax is selected using a prefix on “--exclude”'s
value. “re:” is for regular expression and “fm:”, the default, selects
fnmatch. Selecting the syntax is necessary when regular expressions are
desired or when the desired fnmatch pattern starts with two alphanumeric
characters followed by a colon (i.e. “aa:something/*”). The exclusion
described above can be implemented as follows:
--exclude 're:^home/[^/]+/\.thumbnails$'
The “--exclude-from” option permits loading exclusions from a text file
where the same prefixes can now be used, e.g. “re:\.tmp$”.
The documentation has been extended and now not only describes the two
pattern styles, but also the file format supported by “--exclude-from”.
This change has been discussed in issue #43 and in change request #497.
[1] https://docs.python.org/3/library/fnmatch.html
Signed-off-by: Michael Hanselmann <public@hansmi.ch>