[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Removing duplication: Word lists of common words in languages



Ben Finney writes ("Re: Removing duplication: Word lists of common words in languages"):
> Ian Jackson <ijackson@chiark.greenend.org.uk> writes:
> > I had roughly this question in 2013, and found the answer.  Here is
> > probably the best starting point:
> >
> > http://www.chiark.greenend.org.uk/ucgi/~ijackson/git?p=evade-mail-usrlocal.git;a=blob;f=lemma.al-permission.mbox
> 
> Great! That asks for permission to redistribute the corpus under
> free-software terms, and documents the response in the affirmative.
> Vital for an eventual ‘debian/copyright’. Thank you.
> 
> In that exchange, you also mention you're planning to distribute the
> data in a program. Is that online somewhere, and what's the URL?

Yes.  (Depending on your definition of `distribute'.)

http://www.chiark.greenend.org.uk/ucgi/~ijackson/git?p=evade-mail-usrlocal.git

It's a userv-based tool for managing a domain containing
randomly-generated email aliases, on a shared shell account system.  I
run it on chiark.  I have c&p the relevant section of chiark's
/info/mail.text below, along with the relevant bit of chiark's
/etc/exim4/exim4.conf.pl.

On chiark I run this directly out of a git working tree in /usr/local,
with symlinks from the relevant bits of /etc, /usr/local/bin, etc.  If
anyone else thinks they might actually want this, I might consider
productising it a bit more.

Of course anyone else is welcome to do so, starting with the git tree
there.  I see that I have forgotten to give it a copyright licence or
indeed any copyright notices.

Please treat it as AGPLv3+.  (This is compatible with the GPLv2+
permission that I requested from Adam Kilgarriff.  CCing Matthew
Vernon as the other copyrightholder.)

Ian.



3. Randomly-generated (weakly-psuedonymous) addresses
-----------------------------------------------------

chiark users can have randomly-generated short email addresses
<short-random-string>@fyvzl.net, and randomly-generated readable
email addresses <word>.<word>.<word>@evade.org.uk.

This is managed using the "slimy-rot13-mail" and "evade-mail"
utilities.  Run them without arguments for their usage messages.

The "choose" option generates ten random addresses and lets you say
which ones you would like to keep.  Paste the ones you like back in,
to have them allocated to you.  (Of course do NOT publish addresses
you have failed, or forgotten, to allocate!)

If you redirect an alias to yourself@chiark then your .forward file
will apply; your .forward file will see the address you redirect to,
not the @fyvzl or @evade address.

chiark's spamfilters treat fyvzl.net and evade.org.uk the same as
slimy.greenend.org.uk (see /info/spam.text).  These addresses do not
go through SAUCE.

On privacy: these addresses are not trivial to map to a particular
user from outside chiark, although bounces (and any replies you send!)
are likely to reveal the linkage.  It's not easy for another user to
get a complete list of your aliases, but chiark's mail logs are
available to everyone.  And any chiark user can use exim -bt to
discover where a particular alias redirects.

These aliases are recorded in a database.  It is not possible to ever
delete aliases because that would run the risk that another user would
subsequently be allocated an alias previously used by someone else.

The tools `evade-mail-pregen', `slimy-rot13-mail-pregen' and
`numbered-alias-sheet' can arrange to conveniently format pregenerated
aliases on sheets of paper for you to carry about and give to people
when offline.  Run them without arguments to see the usage messages.

The usage message for numbered-alias-sheet has some examples.



evade_hard_dir:
  ".aliasdir("/etc/aliases-evade")."
  domains = @evade_domains
  user = mail

evade_db_dir:
  domains = @evade_domains
  driver = redirect
  allow_defer = true
  data = ".'${lookup sqlite {/var/lib/evade-mail/$domain.sqlite3 \
   select redirect from addrs where \
      localpart=\'${quote_sqlite:$local_part}\' \
      and not redirect = \'\' \
      and user not in disabled_users;}}'."


Reply to: