[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: File & folder clean up



B. M. wrote:
> Short Summary:
> How can I find files which parent folders have the same name?
> ...
> Assuming that at least some of these files are in parent folders with
> the same name, do you know any tool which can help in finding them and
> moving them around?

The 'find' program is the standard utility to find files.  GNU find
includes a regex (regular expression) extension.  I think it should do
what you want.

  find . -iregex '.*\(.*\)/\1'

Here is a test case I made for your example.

  $ find
  .
  ./aaa
  ./aaa/file1
  ./aaa/ccc
  ./aaa/bbb
  ./aaa/bbb/bbb
  ./aaa/aaa
  ./aaa/aaa/aaa

Running that find command upon it prints:

  find . -iregex '.*\(.*\)/\1'
  ./aaa/bbb/bbb
  ./aaa/aaa
  ./aaa/aaa/aaa

Explanation of the command.  The find command finds files.  Directories
in Unix are simply files.  Special files but files just the same.  The
-iregex option takes a regular expression to match across the entire
path from begining to end.  The 'i' part of iregex is to ignore case.
I assume you would want to search without case ensitivity.

The (...)  part starts a regular expression grouping.  The parens must
be quoted with \(...\) to turn on their magic function since this is
an extended regular expression syntax (ERE) and the default is basic
regular expressions (BRE).  You can read all about regular
expressions, the different engines, and how to use them.  The '.'
matches any single character.  The '*' modifies that to match any
number of characters.  You see '.*' a lot in regular expressions.
Putting it in (.*) matches any number of characters and groups it into
a grouping that can be referenced again later in the expression with a
backreference.  The \1 is a backreference that means whatever was
previously matched in the first (...) grouping.  If there were a \2
that would match the second grouping and so forth.  It feels a little
magical but (.*)/\1 matches anything that is aaa/aaa or bbbb/bbbb
where the second part is the same as the first part.  The '/' in the
middle matches the directory separator so that the first part and the
second part must be different directories.

Hope that helps,
Bob

Attachment: signature.asc
Description: Digital signature


Reply to: