[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [maybe OT] unicode control characters in filenames



On Wednesday 10 August 2011 06:53:58 Darac Marjal wrote:
> On Tue, Aug 09, 2011 at 01:24:46PM -0700, Mike McClain wrote:
> > On Tue, Aug 09, 2011 at 12:42:18PM -0400, Eike Lantzsch wrote:
> > > Hi:
> > > 
> > > For some time I'm looking to find a method to remove unicode control
> > > characters like U+202A; U+202C; U+200F from filenames.
> > > I found lots of examples to do this programmatically with python, perl,
> > > even for VB and Java.
> > > I was looking to do this with bash, find, grep and/or even sed because
> > > I just never wrote code in python or perl.
> > > Can some kind soul please give me a hint how to proceed?
> 
> If you've found a recipe in perl, I can recommend /usr/bin/rename (part
> of the perl package and, on my system, a link to /usr/bin/prename). The
> syntax is "rename regex filespec" so you can say "rename 's/foo/bar/
> bar.jpg". Maybe that'll help.

Thank you for the suggestion, but as far as I can see prename is not UTF-8-
aware. Is that true?

I'm right now studying 
http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8
and
http://www.perlmonks.org/?node_id=551676
and
http://perldoc.perl.org/perlunicode.html

if I enter

myuser@mysytem:~/path-name-of-unicode-files$ rename -n 's/\x{202A}//' *

I get no output although x{202A} is definitely the first char in the filename.
This definitely needs more than a cursory view into perl - exactly what I 
wanted to avoid.
Maybe I better post in a perl mailinglist. Only I'm afraid that I'll get 
nothing but RTFM! and "do your own homework!" - they maybe right ...

Cheers
Eike


Reply to: