[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Pico



Jean-Philippe MENGUAL <mengualjeanphi@free.fr> wrote:
> Re,
> 
> Ok here's my 1st feedbacks about pico. I think it's a very good
> opportunity for the future. I only see, after a few minutes, these
> problems:
> 1. I tried to change the rate via orca, even set to 60, it doesn't
> change anything.

It's probably an issue with either Orca or, more likely, Speech-Dispatcher.
SVOX Pico can change its speech rate without any problems, except for the
pauses at clause boundaries and punctuation, which have a fixed duration.
There was some discussion of this earlier on the list.
> 2. The volume, even maximum, is low. I believe a bug is reported about
> this upstream.

As I remember, Speech-Dispatcher handles the output to the audio device, so it
should be able to regulate the volume, perhaps with some coding.
> 3. I'm not absolutely sure about stability: it worked with orca -t, F
> did ctrl-c, didn't work anymore, I rebooted the system, so it worked...
> Don't understand why pico didn't worked anymore after ctrl-c.

I don't know. Did speech-dispatcher log an error?
> 4. When a number has more than 1 figure (10, 11, 12...), it's not
> pronounced in French, the synthetiser reads the 2 figures. Instead of
> saying 12, it says 1 2.

There's a pre-processor which is responsible for handling numbers, dates,
abbreviations, etc., which I am sure could be improved. It is controled by
rules in the language files. Unfortunately, the software to build the language
files is currently available only as MS-Windows executables. I think somebody
with serious funding available to pay for it should enter into negotiations
with SVOX to see whether that code can be released in source form.
The tools just generate the binary language files from the source files; there
shouldn't be any proprietary secrets in there. The code for actually building
the hidden Markov models, decision trees, etc., isn't publicly available and
may never be, since the algorithms involved are probably highly proprietary.
> 5. Of course, the prosody can be improved. But I find that the quality
> is very interesting. The way is less long between current state and a
> very very good product.

It's the product of serious research in speech processing, machine learning
and computational linguistics. I don't have any expertise in any of those
areas, but I did look at the source code. SVOX claim that Pico provides the
best speech quality for a synthesizer of its size and resource usage. I
haven't heard anything that can compete with it on these criteria. For such a
small synthesizer, it's very good.

Of course there are problems and limitations as well.


Reply to: