Re: Viable speech recognition tools?

To: debian-accessibility@lists.debian.org
Subject: Re: Viable speech recognition tools?
From: Richard Owlett <rowlett@cloud85.net>
Date: Fri, 21 May 2021 05:54:16 -0500
Message-id: <[🔎] 9ee51cd3-a4ea-9070-5bfd-df73aa9e30b7@cloud85.net>
In-reply-to: <[🔎] 1c1aa820-b06d-eb1c-b19b-aee37632a399@dottywood.org>
References: <[🔎] 356b0c33-8516-0fff-36af-a72d5deb2234@cloud85.net> <[🔎] a81914a4-877d-f907-2203-3d213d5320a2@dottywood.org> <[🔎] d87276c2-bec2-7e14-f04a-b87fed70bab2@cloud85.net> <[🔎] 1c1aa820-b06d-eb1c-b19b-aee37632a399@dottywood.org>

On 05/20/2021 10:25 AM, Aaron wrote:


On 5/19/21 5:48 AM, Richard Owlett wrote:

On 05/16/2021 01:00 PM, Aaron wrote:

On 5/16/21 8:19 AM, Richard Owlett wrote:


[I'm subscribed to the list ;]

I notice PocketSphinx in the Debian repositories.
How suitable is it for dictation by a single speaker?
I realize it is designed to be speaker independent.
TIA

I wouldn't say it is designed to be speaker independent.


When I read the description I may have "seen" what I wanted to see.
I haven't investigated speech recognition since I was using Windows a
decade ago.

I'm assuming training to my voice and speaking style. I want
continuous speech and as large a vocabulary as possible.

Thank you for getting in touch. I feel like I have a somewhat better
idea of what you are trying to do.

Kaldi, Deepspeech and FlashlightASR all recommend Linux for your
environment. I'm not sure if any of them can run on Windows or OSX.

I run only Debian Linux. I have some machines on the i386 flavor but ammoving to AMD64. The majority are Lenovo Thinkpads with legacy bios.


Pocketsphinx is definitely not going to work for the purposes of taking
dictation for letters.

The easiest way to get speech recognition is going to be to use an
online service like Google Cloud TTS. This has the full power of the
Google search engine behind it as far as language model, and they handle
all the optimizations on their side automatically. I think there is
still a free version of this service. The main reason to avoid it is, of
course, privacy, and second being that it requires an internet
connection. I only mention it because it is so much easier to get set up
right now and you didn't explicitly state what your requirements are.

BOTH caveats apply. I see a third potential pitfall. I suspect Googlewill emulate Microsoft and Canonical in providing only that *THEY* thinkthe user *should* want. Yes, I ave strong opinions ;}


Kaldi, Mozilla Deepspeech, and FlashlightASR are all viable options.
They are all free and open source, run locally, and interface well with
Python as far as scripting the training and recognition processes (they
also interface with c++ but I'd at least prototype stuff in python
first). Kaldi and FlashlightASR are currently aimed at researchers, so
they are not easy to set up and the documentation is full of
intimidating formulas and technical jargon. Mozilla Deepspeech is
somewhat gentler to work with and seems to have better support, plus it
can be installed with a simple "pip install deepspeech". Mozilla
Deepspeech and FlashlightASR both use KenLM language models by default,
while Kaldi uses a variety of language models.

The links I found seemed to suggest Deepspeech was aiming at people likeme. An important feature is that it is open source. However, I found onarticle suggesting Mozilla was winding down its development. [https://venturebeat.com/2021/04/12/mozilla-winds-down-deepspeech-development-announces-grant-program/]


I'm currently working on a research project where I am trying to compare
the current state of different speech recognition engines and classify
them according to strengths and weaknesses. If I can be helpful, please
let me know.

Is there a recommended download site which has an associated usernetwork, be it mailing list or USENET {preferred}. I find web based foraunusable.

TIA

Reply to:

Follow-Ups:
- Re: Viable speech recognition tools?
  - From: Jason White <jason@jasonjgw.net>
- Re: Viable speech recognition tools?
  - From: Aaron <aaron.chantrill@dottywood.org>

References:
- Viable speech recognition tools?
  - From: Richard Owlett <rowlett@cloud85.net>
- Re: Viable speech recognition tools?
  - From: Aaron <aaron.chantrill@dottywood.org>
- Re: Viable speech recognition tools?
  - From: Richard Owlett <rowlett@cloud85.net>
- Re: Viable speech recognition tools?
  - From: Aaron <aaron.chantrill@dottywood.org>

Prev by Date: Re: Viable speech recognition tools?
Next by Date: Re: Viable speech recognition tools?
Previous by thread: Re: Viable speech recognition tools?
Next by thread: Re: Viable speech recognition tools?
Index(es):
- Date
- Thread