[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Viable speech recognition tools?



On 05/20/2021 10:25 AM, Aaron wrote:

On 5/19/21 5:48 AM, Richard Owlett wrote:
On 05/16/2021 01:00 PM, Aaron wrote:
On 5/16/21 8:19 AM, Richard Owlett wrote:

[I'm subscribed to the list ;]


I notice PocketSphinx in the Debian repositories.
How suitable is it for dictation by a single speaker?
I realize it is designed to be speaker independent.
TIA

I wouldn't say it is designed to be speaker independent.

When I read the description I may have "seen" what I wanted to see.
I haven't investigated speech recognition since I was using Windows a
decade ago.

I'm assuming training to my voice and speaking style. I want
continuous speech and as large a vocabulary as possible.

Thank you for getting in touch. I feel like I have a somewhat better
idea of what you are trying to do.

Kaldi, Deepspeech and FlashlightASR all recommend Linux for your
environment. I'm not sure if any of them can run on Windows or OSX.

I run only Debian Linux. I have some machines on the i386 flavor but am moving to AMD64. The majority are Lenovo Thinkpads with legacy bios.


Pocketsphinx is definitely not going to work for the purposes of taking
dictation for letters.

The easiest way to get speech recognition is going to be to use an
online service like Google Cloud TTS. This has the full power of the
Google search engine behind it as far as language model, and they handle
all the optimizations on their side automatically. I think there is
still a free version of this service. The main reason to avoid it is, of
course, privacy, and second being that it requires an internet
connection. I only mention it because it is so much easier to get set up
right now and you didn't explicitly state what your requirements are.

BOTH caveats apply. I see a third potential pitfall. I suspect Google will emulate Microsoft and Canonical in providing only that *THEY* think the user *should* want. Yes, I ave strong opinions ;}


Kaldi, Mozilla Deepspeech, and FlashlightASR are all viable options.
They are all free and open source, run locally, and interface well with
Python as far as scripting the training and recognition processes (they
also interface with c++ but I'd at least prototype stuff in python
first). Kaldi and FlashlightASR are currently aimed at researchers, so
they are not easy to set up and the documentation is full of
intimidating formulas and technical jargon. Mozilla Deepspeech is
somewhat gentler to work with and seems to have better support, plus it
can be installed with a simple "pip install deepspeech". Mozilla
Deepspeech and FlashlightASR both use KenLM language models by default,
while Kaldi uses a variety of language models.

The links I found seemed to suggest Deepspeech was aiming at people like me. An important feature is that it is open source. However, I found on article suggesting Mozilla was winding down its development. [ https://venturebeat.com/2021/04/12/mozilla-winds-down-deepspeech-development-announces-grant-program/ ]


I'm currently working on a research project where I am trying to compare
the current state of different speech recognition engines and classify
them according to strengths and weaknesses. If I can be helpful, please
let me know.

Is there a recommended download site which has an associated user network, be it mailing list or USENET {preferred}. I find web based fora unusable.

TIA





Reply to: