Ron Johnson wrote:
> On Sun, 2005-03-06 at 02:16 +0100, Olle Eriksson wrote:
> > Can anyone help me with how to find the common textual denominator of an
> > array of strings. I have been searching the web and the man pages of
> > grep, awk etc to no avail.
> >
> > Given the following list of directory names I want to have a script return
> > "Eric_Clapton".
> >
> > Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
> > Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
> > Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
> > Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
> > Eric_Clapton - Pilgrim/
>
> You want a generic algorithm?
Assuming word splitting is ok and you want to avoid O(N^2) methods:
joey@dragon:~>cat foo
foo by Clapton, Eric
Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
Eric_Clapton - Pilgrim/
joey@dragon:~>perl -e 'while (<>) { my %seen; foreach my $w (split /[^a-zA-Z0-9]/) { next unless length $w; $count{$w}++ unless $seen{$w}; $seen{$w}=1 } }; foreach (keys %count) { $max=$count{$_} if $max < $count{$_} }; foreach (keys %count) { print "$_\n" if $count{$_} == $max }' < foo
Clapton
Eric
--
see shy jo
Attachment:
signature.asc
Description: Digital signature