[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: I'm working on an article



On 11/12/25 19:08, Jason J.G. White wrote:

On 12/11/25 10:17, Aaron Chantrill wrote:
I'm working on an article for Linux Magazine. For this article, I'm interested in talking about setting up speech dispatcher with different text to speech engines, like Piper TTS or Coqui TTS. This is based on a question from this mailing list a couple of months ago. I'm hoping to start a series on accessibility issues while deepening my own understanding.

For screen reader users, minimizing audio latency is important. Unfortunately,

 the neural network-based TTS systems, including Coqui and Piper, have a reputation for producing high latency. This is an important reason why screen reader users tend not to use them.

I don't know whether this is improved if you have appropriate GPU processing for the neural network models. Piper was unusably slow on my machine, but I didn't investigate deeply enough to find out whether it was using the GPU.

Piper when run as a command line program is unusably slow because it has to load the full onnx model every time you call it. My goal is to use piper's built-in http server. This is the same way the older mimic3-general.conf module worked. Of course, writing an http server front end that can hold a model in memory isn't that difficult, so if other TTS programs don't include a web service, it shouldn't be that difficult to write one. Once the onnx model is loaded, Piper runs faster than real time (it takes longer to say the output than to generate it) even on a Raspberry Pi 3, so latency and GPU shouldn't be an issue, but running an additional web server as a service does introduce additional complexity.

Thank you, Aaron


Reply to: