[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Finding the common textual denominator



On Sunday 06 March 2005 09:56, Joey Hess wrote:
> Assuming word splitting is ok and you want to avoid O(N^2) methods:
>
> joey@dragon:~>cat foo
> foo by Clapton, Eric
> Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
> Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
> Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
> Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
> Eric_Clapton - Pilgrim/
> joey@dragon:~>perl -e 'while (<>) { my %seen; foreach my $w (split
> /[^a-zA-Z0-9]/) { next unless length $w; $count{$w}++ unless $seen{$w};
> $seen{$w}=1 } }; foreach (keys %count) { $max=$count{$_} if $max <
> $count{$_} }; foreach (keys %count) { print "$_\n" if $count{$_} == $max }'
> < foo Clapton
> Eric

Nicely done :)

You'd need to remove keys that don't appear on particular lines, though.  That 
would help keep memory down on large lists too, hopefully.

-- 
Lee.



Reply to: