[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [tex-live] Strange license of ukhyphen (fwd)



> Werner LEMBERG wrote:
> > What about the following: Get a reliable list of UK English words
> > (probably sorted by frequency), apply the current UK patterns,
> > carefully check the results and regenerate the patterns.
> >
> good idea. curiously, my institution curates
> a 100 million word corpus of British English
> (http://www.natcorp.ox.ac.uk/), marked up
> to the word level; deriving a
> list of words from that would be a rather
> small bit of XML retrieval.
> 
> If I get the list of words, does anyone
> else have the time and energy to make the
> experiment?

I am willing to do the patterns generation part.
But
-- BNC wordlist (which I have too) is full of non-English words,
   proper names, ..., who will do the cleanup?
-- The most time-consuming step is checking the hyphenated 
   BNC wordlist by somebody knowing the ethymology 
   of English words -- this is the rule OUP
   use in deciding on (UK) hyphenation points.
US people/publishers use quite different rules
(basically syllable-based).

Send me the cleaned UK wordlist and I'll do the bootstrap phase
(prepare the hyphenated list and list of
candidates for checking [potential exceptions]).

All the best

--ps



Reply to: