[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Deleting some regexp/simple expression from lots of files in a secure way



On Friday 14 May 2010 12:52:45 Merciadri Luca wrote:
> "Boyd Stephen Smith Jr." <bss@iguanasuicide.net> writes:
> > On Friday 14 May 2010 12:04:42 Merciadri Luca wrote:
> >> I have many text files (actually .tex files) which contain some
> >> sequence or regexp (it depends on the files) that I would like to
> >> remove. Is there a commandline/GUI for doing this massive edit?
> >
> > (sed -i -e "s/$regexp//" "$file") for a single file.  (GNU sed only.)
> >
> > (find $dir -type f -exec sed -i -e "s/$regexp//" {} \;) for all files in
> > a directory.
> 
> I am using the second command. The problem is that, for one set of
> files (that I have selected, no problem for this), I have to use a
> really simple expression: I need to find all the occurences of
> `\paragraph{}' and replace them with nothing (i.e. with `'). I know
> regexps, but replacing `$regexp' with `\paragraph{}' gives error
> messages. Any idea? Thanks.

First you need a (basic) regular expression (BRE) that matches "\paragraph{}".  
The '\' is a BRE special character, so it needs to be escaped.  Also, the "{}" 
is a bit troublesome with find/-exec, so we will match it using the construct 
"[{][}]".

The definitive documentation for regular expression is the Single UNIX 
Specification, Version 3 -- Base Definitions, Chapter 9.  I don't actually 
like (man 7 regex) for this.

This gives us the regex "\\paragraph[{][}]".  Now, we need to get that regular 
expression to sed.  (find $dir -type f -exec sed -i -e "s/\\paragraph[{][}]//" 
{} \;) won't work, since during Quote Removal, one of the '\'s are dropped and 
neither find nor sed "sees" it.

The shell does a *lot* of processing to the text you type before it reaches 
the command you are invoking.  Single UNIX Specification, Version 3 -- Shell 
and Utilities, Chapter 2 is the core documentation, but some shells are much 
more featureful.

We can either use (find $dir -type f -exec sed -i -e 's/\\paragraph[{][}]//' 
{} \;) OR--my preference--(regex='\\paragraph[{][}]'; find $dir -type f -exec 
sed -i -e "s/$regex//" {} \;) to make sure sed gets that important '\'.

Also, I left it out, but you may want the "g" flag to the "s"ubstitute command 
in sed.  Otherwise, only one occurrence of the regex will get eliminated per 
line.
-- 
Boyd Stephen Smith Jr.           	 ,= ,-_-. =.
bss@iguanasuicide.net            	((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy 	 `-'(. .)`-'
http://iguanasuicide.net/        	     \_/

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: