Re: Finding the common textual denominator

To: Debian-User <debian-user@lists.debian.org>
Subject: Re: Finding the common textual denominator
From: Lee Braiden <jel@tundra.ath.cx>
Date: Sun, 6 Mar 2005 10:17:16 +0000
Message-id: <[🔎] 200503061017.16690.jel@tundra.ath.cx>
In-reply-to: <[🔎] 20050306095601.GB30313@kitenet.net>
References: <[🔎] 200503060216.36109.subscriptions@olle-eriksson.com> <[🔎] 1110078641.26863.3.camel@haggis.homelan> <[🔎] 20050306095601.GB30313@kitenet.net>

On Sunday 06 March 2005 09:56, Joey Hess wrote:
> Assuming word splitting is ok and you want to avoid O(N^2) methods:
>
> joey@dragon:~>cat foo
> foo by Clapton, Eric
> Eric_Clapton-Big_Boss_Man-2CD-Retail-2002-DGN/
> Eric_Clapton-Higher_Ground-(CDS)-2003-RNS/
> Eric_Clapton_-_Me_and_Mr_Johnson-(PROPER)-CD-2004-TN/
> Eric_Clapton-One_More_Car_One_More_Rider-2CD-2002-RNS/
> Eric_Clapton - Pilgrim/
> joey@dragon:~>perl -e 'while (<>) { my %seen; foreach my $w (split
> /[^a-zA-Z0-9]/) { next unless length $w; $count{$w}++ unless $seen{$w};
> $seen{$w}=1 } }; foreach (keys %count) { $max=$count{$_} if $max <
> $count{$_} }; foreach (keys %count) { print "$_\n" if $count{$_} == $max }'
> < foo Clapton
> Eric

Nicely done :)

You'd need to remove keys that don't appear on particular lines, though.  That 
would help keep memory down on large lists too, hopefully.

-- 
Lee.

Reply to:

References:
- Finding the common textual denominator
  - From: Olle Eriksson <subscriptions@olle-eriksson.com>
- Re: Finding the common textual denominator
  - From: Ron Johnson <ron.l.johnson@cox.net>
- Re: Finding the common textual denominator
  - From: Joey Hess <joeyh@debian.org>

Prev by Date: Re: Kernel 2.6.11
Next by Date: Re: Xlib: connection refused
Previous by thread: Re: Finding the common textual denominator
Next by thread: Re: Finding the common textual denominator
Index(es):
- Date
- Thread