[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Finding the common textual denominator



Ron Johnson wrote:
> On Sun, 2005-03-06 at 02:16 +0100, Olle Eriksson wrote:
> > Can anyone help me with how to find the common textual denominator of an 
> > array of strings. I have been searching the web and the man pages of 
> > grep, awk etc to no avail.
> > 
> > Given the following list of directory names I want to have a script return 
> > "Eric_Clapton".
> > 
> > Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
> > Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
> > Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
> > Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
> > Eric_Clapton - Pilgrim/
> 
> You want a generic algorithm?

Assuming word splitting is ok and you want to avoid O(N^2) methods:

joey@dragon:~>cat foo
foo by Clapton, Eric
Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
Eric_Clapton - Pilgrim/
joey@dragon:~>perl -e 'while (<>) { my %seen; foreach my $w (split /[^a-zA-Z0-9]/) { next unless length $w; $count{$w}++ unless $seen{$w}; $seen{$w}=1 } }; foreach (keys %count) { $max=$count{$_} if $max < $count{$_} }; foreach (keys %count) { print "$_\n" if $count{$_} == $max }' < foo
Clapton
Eric

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature


Reply to: