[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: sed:awk:perl::rock:paper:chainsaw [was Re: Using .XCompose]



Hi,

> I seem to recall that he puts Perl at the top of the 
> heap, and notes that Perl compatible regular expressions (PCRE) are 
> available via libraries in other programming languages.

Thanks for confirming that I didn't make a wrong choice. Programs that claim to use PCRE don't support everything that PERL does.

I wanted to clean many documents (Wikipedia dump) to analyse the Malayalam content. As I was not comfortable with scripting, I was looking for some prorgam that could remove the foreign language text from the files. As, I could find none that could do the job, I had to use a Perl script with the line below (among others)

s/[^\p{Block: Malayalam}\p{Block: Basic_Latin}\p{Block: General_Punctuation}\s]//g; # remove characters outside the specified unicode blocks.

As of now, the simple substitute command of perl is sufficient for my requirements. Even that one command appears powerful compared to others.

ajith


Reply to: