[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wordplay vs. an



Joey Hess <joey@kite.ml.org> wrote:
> I just noticed the "an" program in debian. I maintain wordplay. Both
> programs generate anagrams.
> 
> It seems to me, that an is a superior program:
> 

As the coder/maintainer, I thank you ;)

> 	1. It uses a standard dictionary. (wordplay is required by 
> 		its copyright to be packaged with a special word list 
> 		file, though it can be configured to use a different one.)
> 
> 	2. It's faster. (2x faster anagramming "debian gnu linux", claims
> 		of being 10x faster in some cases.)

	The larger speed increases for 'an' occur as the number of anagrams
	produced increases.  For generation runs producing taking less than
        1 second, wordplay may in fact be faster due to 'an' taking some time  
         to setup its data structures (there is some optimisation that could
        be done here, but this area was considered far less critical than
        the recursive searching function.  Try finding a phrase that takes
        about 5 mins to produce all anagrams  with 'an' (/dev/null would be a 	
        good place for the output ;)  then try the same phrase with wordplay.
        The difference in favour of 'an' should be greater,  furthermore the   
         option -l for 'an' which specifies maximum number of words allowed in 
          an anagram is the case most optimised for (its the option I find 
most           useful), the corresponding option in wordplay is -d I believe.  
When            using these options 'an' should be at least 10x as fast for 
phrases             with many anagrams.  A couple of tests:

$ time /usr/games/an -d /usr/lib/games/wordplay/words721.txt -l 4 'debian gnu 
linux rules' | wc
   7302   29192  175274

real    0m2.695s
user    0m1.250s
sys     0m0.060s
$ time wordplay -sav -d4 'debian gnu linux rules' | wc
   7302   29192  167972

real    0m54.787s
user    0m27.390s
sys     0m0.020s

Clearly 'an' shines in this situation, both examples asked for all anagrams
of "debian gnu linux rules" with a maximum of 4 words in them, the -d
option for 'an' specified the same dictionary that wordplay uses so that
things were even there.  The -av options specifiy not cutting out repeated 
words or words with only vowels in them, this mirrors the 'an' method and 
makes comparisons easier.  In this case the -av option didn't affect the 
number of anagrams found by wordplay and removing them actually made the 
program slightly slower.  I think this drastic difference is due to wordplay's 
data structures not allowing for some of the smart search termination methods 
that 'an' employs.


> 
> 	3. It ouputs in lower case, wheras wordplay outputs in hard to scan
> 		uppercase. Ok, this is trivial :-)

	What do you expect from a program that was originally coded in Fortran 
(according to the README) ;)

> 
> 	4. It's GNU copyrighted.
> 
	Hey, and it also takes GNU long options ;)


> 	5. In a trial using the same dictionary, an found more anagrams than
> 		wordplay did. This is the one main reason I find an superior 
> 		-- wordplay doesn't find all anagrams.

	The -av options mentioned above make wordplay found closer to the same number 
as 'an', however for some situations (which I was unable to determine), 'an' 
does appear to find more anagrams than wordplay.

> 
> However, wordplay does have a couple of advantages:
> 
> 	1. Its word list file is larger, which is one reason it's slower. 
> 		But this means it finds more anagrams (wordplay found 3094 
> 		anagrams of "debian gnu linux", an only 2681) But see #5
> 		above.

	I thought about distributing 'an' with a dictionary but thought it was better 
left to the user, (besdies they may be german or french or whatever ;).


> 
> 	2. It has a few options that an lacks.
> 

	Like the -av to make it produce most of the anagrams 'an' does ;)

> 	3. I spent a few hours writing a man page for the thing, and I hate 
> 		to waste my effort. :-)

	Now that one I can't argue with ;)

> 
> 	4. It has a better name than "an". Programs whouldn't be named after
> 		common parts of speech, it's confusing. :-)
> 

	Well my excuse is that when I wrote the thing it was coz I was surprised that 
the bsd games distribution didnt have such a program, hence in keeping with 
the cryptic bsd games names I chose 'an' ;)


> My feeling is that debian has room for two anagrams generators, but there's
> not a lot of difference between them, and an is superior. I'd just as soon 
> drop wordplay from the distribution, unless someone has a strong desire to 
> keep it. So, if someone would like to keep wordplay in the distribution, 
> please speak up.
> 

I don't mind either way, it would be good if the upstream maintainer put some 
more work into wordplay, the competition may spur me on to further 
developement (not that I have much time with less trivial projects on my hands 
at the moment, and no , *not* a palindrome generator ;)


P.S. anyone know much about language parsing and feel like writing a fast 
grammar checker for 'an'?  I tried using a yacc type parser (not perfect for 
parsing natural language), but it was woefully slow, and not that robust for 
some types of english structures (I don't speak any other languages), although 
it did add the interesting property of having 6 levels of different recursive 
functions nested within each other with none of them lending themselves too 
neatly to alternative iterative solutions).

P.P.S Perhaps I get a little overexcited over 'an'? ;)


Reply to: