Re: wordplay vs. an
Joey Hess <joey@kite.ml.org> wrote:
> I just noticed the "an" program in debian. I maintain wordplay. Both
> programs generate anagrams.
>
> It seems to me, that an is a superior program:
>
As the coder/maintainer, I thank you ;)
> 1. It uses a standard dictionary. (wordplay is required by
> its copyright to be packaged with a special word list
> file, though it can be configured to use a different one.)
>
> 2. It's faster. (2x faster anagramming "debian gnu linux", claims
> of being 10x faster in some cases.)
The larger speed increases for 'an' occur as the number of anagrams
produced increases. For generation runs producing taking less than
1 second, wordplay may in fact be faster due to 'an' taking some time
to setup its data structures (there is some optimisation that could
be done here, but this area was considered far less critical than
the recursive searching function. Try finding a phrase that takes
about 5 mins to produce all anagrams with 'an' (/dev/null would be a
good place for the output ;) then try the same phrase with wordplay.
The difference in favour of 'an' should be greater, furthermore the
option -l for 'an' which specifies maximum number of words allowed in
an anagram is the case most optimised for (its the option I find
most useful), the corresponding option in wordplay is -d I believe.
When using these options 'an' should be at least 10x as fast for
phrases with many anagrams. A couple of tests:
$ time /usr/games/an -d /usr/lib/games/wordplay/words721.txt -l 4 'debian gnu
linux rules' | wc
7302 29192 175274
real 0m2.695s
user 0m1.250s
sys 0m0.060s
$ time wordplay -sav -d4 'debian gnu linux rules' | wc
7302 29192 167972
real 0m54.787s
user 0m27.390s
sys 0m0.020s
Clearly 'an' shines in this situation, both examples asked for all anagrams
of "debian gnu linux rules" with a maximum of 4 words in them, the -d
option for 'an' specified the same dictionary that wordplay uses so that
things were even there. The -av options specifiy not cutting out repeated
words or words with only vowels in them, this mirrors the 'an' method and
makes comparisons easier. In this case the -av option didn't affect the
number of anagrams found by wordplay and removing them actually made the
program slightly slower. I think this drastic difference is due to wordplay's
data structures not allowing for some of the smart search termination methods
that 'an' employs.
>
> 3. It ouputs in lower case, wheras wordplay outputs in hard to scan
> uppercase. Ok, this is trivial :-)
What do you expect from a program that was originally coded in Fortran
(according to the README) ;)
>
> 4. It's GNU copyrighted.
>
Hey, and it also takes GNU long options ;)
> 5. In a trial using the same dictionary, an found more anagrams than
> wordplay did. This is the one main reason I find an superior
> -- wordplay doesn't find all anagrams.
The -av options mentioned above make wordplay found closer to the same number
as 'an', however for some situations (which I was unable to determine), 'an'
does appear to find more anagrams than wordplay.
>
> However, wordplay does have a couple of advantages:
>
> 1. Its word list file is larger, which is one reason it's slower.
> But this means it finds more anagrams (wordplay found 3094
> anagrams of "debian gnu linux", an only 2681) But see #5
> above.
I thought about distributing 'an' with a dictionary but thought it was better
left to the user, (besdies they may be german or french or whatever ;).
>
> 2. It has a few options that an lacks.
>
Like the -av to make it produce most of the anagrams 'an' does ;)
> 3. I spent a few hours writing a man page for the thing, and I hate
> to waste my effort. :-)
Now that one I can't argue with ;)
>
> 4. It has a better name than "an". Programs whouldn't be named after
> common parts of speech, it's confusing. :-)
>
Well my excuse is that when I wrote the thing it was coz I was surprised that
the bsd games distribution didnt have such a program, hence in keeping with
the cryptic bsd games names I chose 'an' ;)
> My feeling is that debian has room for two anagrams generators, but there's
> not a lot of difference between them, and an is superior. I'd just as soon
> drop wordplay from the distribution, unless someone has a strong desire to
> keep it. So, if someone would like to keep wordplay in the distribution,
> please speak up.
>
I don't mind either way, it would be good if the upstream maintainer put some
more work into wordplay, the competition may spur me on to further
developement (not that I have much time with less trivial projects on my hands
at the moment, and no , *not* a palindrome generator ;)
P.S. anyone know much about language parsing and feel like writing a fast
grammar checker for 'an'? I tried using a yacc type parser (not perfect for
parsing natural language), but it was woefully slow, and not that robust for
some types of english structures (I don't speak any other languages), although
it did add the interesting property of having 6 levels of different recursive
functions nested within each other with none of them lending themselves too
neatly to alternative iterative solutions).
P.P.S Perhaps I get a little overexcited over 'an'? ;)
Reply to: