Hi,
Another possibility: in my experience bogofilter seems to work better when it has seen very much more non-spam than spam e-mail. As I recall your data set was about evenly split between the two.
My own data about Bogofilter (part of a French text available at http://oumph.free.fr/textes/penibles_du_net.html#pourriel , written in June):
* initial learning my email boxes for 21 month: 40839 ham 2231 spam (5%) ----- 43060 mails (more than 300 MiB) 17.6 MiB goodlist.db (ham database) 3.0 MiB spamlist.db (spam database) * 22 days later: 559 new spams (19%) 90 false negative 469 detected spam 49 virus/worm/trojan (1,7%) [*] 2327 ham (79%) 0 false positive 2327 detected ham ---- 2935 mails 19.0 MiB goodlist.db 3.9 MiB spamlist.dbprocmail+bogofilter looks good: good success rate and no (or few) false positives.
[*] mainly a rule to detect PE executables in attachments -- Benoît Sibaud