[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Release-critical Bugreport for June 23, 2000



Sami Haahtinen wrote:
> > Pronouncability implies a certian degree of regularity. I could file a
> > RC bug report stating that pwgen always includes vowels in its
> > passwords, but it seems likely it does so by design. I'm not sure that
> > this 'oo' thing isn't also be design.
> 
> After a discussion about this with Itai Zukerman, i came to sort of agree
> with this, althought this might be a design issue, in finnish atleast it's
> not that much easier to pronounce 'oo' than any other two letter squence.
> 
> I think that the person who reported the bug, wasn't english and didn't
> see a point here. (i can see the point but makes no difference to me)
> 
> maybe this should be an option... this might be good to be downgraded
> to wishlist and add a comment to make this optional.

Unfortunatly, I did some more analysis, and it does look pretty bad. 

I used a cheezy little perl program to calculate the numbers of times
adjacent pairs of letters appeared in words, both in
/usr/share/dict/words and in the output of pwgen.

joey@gumdrop:~>cat /usr/share/dict/words| perl -ne '$_=lc $_;
$len=length $_; for ($x=0; $x < $len-2; $x++) { $f{substr($_, $x, 2)}++
}; END { print map { $_="$f{$_}\t$_\n" } keys %f }' |sort -rn | head -20 
42411   er
33655   in
31669   ti
29754   on
29403   te
28140   al
28121   an
27247   at
26482   ic
25006   en
24168   is
23906   re
23710   ra
23287   le
23204   ri
22363   ro
22044   st
21704   ne
21336   ar
20849   li

joey@gumdrop:~>pwgen 8 1000000| perl -ne '$_=lc $_; $len=length $_; for
($x=0; $x < $len-2; $x++) { $f{substr($_, $x, 2)}++ }; END { print map {
$_="$f{$_}\t$_\n" } keys %f }' |sort -rn | head -20
490478  oo
180797  th
140042  ho
126362  qu
125847  sh
125724  ng
125699  ch
114552  hi
89148   ha
84968   ee
84269   ae
76404   he
62049   ot
61360   go
55723   it
54609   os
54417   on
50000   is
49978   gi
49900   in

While all the other letter pairs that appeared near the top in frequency
in pwgen output appeared with outout the same frequency in real life in
the wordlist, oo is appearing about 6 times as often as it should.

-- 
see shy jo



Reply to: