[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Accessibility talk at the GNU Hackers Meeting



On 1 September 2013 13:04, MENGUAL Jean-Philippe <mengualjeanphi@free.fr> wrote:
> Ei,
>
> Le 30/08/2013 23:37, Reece Dunn a écrit :
>
>> On 30 August 2013 20:40, Samuel Thibault <samuel.thibault@ens-lyon.org>
>> wrote:
>>>
>>> Hello,
>>>
>>> We had the opportunity to give a talk about accessibility at the GNU
>>> Hackers Meeting this summer, the video is available
>>>
>>>
>>> http://www.irill.org/videos/GNU_Hackers_Meeting_2013/Samuel_Thibault_Jean-Philippe_Mengual-Freedom_0_for_everybody_really.webm
>>
>> Thank you for an interesting presentation.
>>
>> I have an interest in speech synthesizers and am maintaining a branch
>> of espeak that makes it easier to build and work on in Linux (via
>> autotools) [https://github.com/rhdunn/espeak].
>>
>> Are there any features/functionality missing from espeak, along with
>> their priority?
>>
>> If the issue is of pronunciation accuracy, I have a pronunciation
>> dictionary command-line tool as part of my Cainteoir Text-to-Speech
>> engine project [https://github.com/rhdunn/cainteoir-engine] that can
>> be used to create and manage pronunciation dictionaries for languages
>> that can then be used to generate a <lang>_extra file with words that
>> eSpeak mispronounces. NOTE: I only currently support the default
>> British English voice phonemes, but can easily support others
>
> Cool thanks for information. In my opinion the main problem with Espeak is
> that the voice is not natural at all. Listening to it, we could really think
> that the computer is speaking but not any human voice. It's difficult to put
> up for "basic" users, despite numerous benefits of Espeak.

Thanks.

Currently, the only ways to make espeak sound more natural is to either:
a) use a klatt variant;
b) use an MBROLA voice.

The klatt2 and klatt4 variants sound the best out of the different
klatt variants, but they still have issues and I personally prefer the
original espeak voice to these).

The MBROLA voices sound a lot better, but MBROLA has usage
restrictions (cannot use it for commercial applications) and does not
provide the sources (so, e.g. it is not possible to get it building
with the raspberry pi).

Personally, I use the mb-de5-en voice as I personally think this is
the most natural out of all the different voices.

> Moreurer, having tested NVDA on Windows a few time ago, I've seen that
> Espeak is provided with a lot of variants. I wonder why they're not
> available on Linux. These variants are for French language, but I guess
> there're translated or they have equivalent. And why not present on standard
> Linux systems, where I feel that no update has come for Espeak to change its
> quality for years?

I wonder if these are the MBROLA voices?

Also, espeak has support for what it calls variants for voices -- male
(m1 through m7), female (f1 though f5), klatt (klatt, klatt1, klatt2,
klatt3, klatt4), whisper, whisperf and croak. These can be selected by
using the "<lang>+<variant>" voice syntax, for example:

    $ espeak -v fr+klatt bonjour
    $ espeak -v en+klatt2 "hello there"
    $ espeak -v de+whisper "Guten Tag"

These variants are of varying quality and don't tend to sound as good
as the main espeak voice.

> Finally, I think Espeak could improve its "prosody", i.e. its voice while
> reading sentences, marking punctuation marks. So far we don't see any
> difference between a question mark, a dot, an exclamation mark. And all that
> with an English accent, even for other languages.

Yeah. I don't understand the espeak intonation/prosody logic that
well, so I don't know what is going on here or how to improve it.

>> NOTE: I also have a page on assessing the (subjective) quality of a
>> speech synthesizer at http://reecedunn.co.uk/cainteoir/design/quality
>
> Thanks. Very interesting.  I think it'd  worth to study speech synthetisers
> issue in the free software.

That is a good idea. It would also be worth testing the commercial TTS
engines (including MBROLA) along with the free software ones to get a
decent comparison.

There would also need to be a more scientific way of comparing them.
My quality page is a start, but this should have a more detailed score
card to be useful (i.e. a guideline on how to assess the different
quality aspects to reduce bias/subjectiveness -- e.g. eSpeak would be
marked down for not being able to identify different punctuation and
Cepstral Allison would be marked down for wildly varying pitch).

Thanks,
- Reece H. Dunn (Cainteoir Technologies)


Reply to: