Seeking information about Web Speech API
I recently discovered the Web Speech API
(https://wicg.github.io/speech-api/) which has apparently been
implemented in both Chrome and Firefox. In fact, this appears to have
been the impetus behind the Mozilla DeepSpeech project (later renamed
STT, then spun off into Coqui).
The idea appears to be to allow websites to use JSpeech grammar format
to design telephone tree type interfaces for voice navigation.
The current implementations seems completely broken on Firefox. There is
a demo page at
https://mdn.github.io/dom-examples/web-speech-api/speech-color-changer/
which currently works in Chrome, but in Firefox you have to go into
about:config and enable both media.webspeech.recognition.enable and
media.webspeech.recognition.force_enable to get it to work at all, and
even then all you get as far as a result is "Error occurred in
recognition: network". Apparently Firefox was using the Google Cloud STT
service, but that appears to have been shut down in addition to the
Mozilla DeepSpeech test endpoint.
There was a setting in Firefox: media.webspeech.service.endpoint, which
could be used to set a specific back end server, but this appears to
have been removed, so now that the default endpoint no longer works the
whole api is basically useless.
On Chrome, of course, you just have to trust whatever endpoint they are
using. I highly doubt they are doing STT on-device, so it must be going
to a service somewhere - probably still Google Cloud STT.
Does anyone know anything about the current state of this technology?
I'm working on a library using WebRTC to implement STT in the browser
using AJAX to contact a back end VOSK or Whisper service, but if anyone
is already working on something like this, I'd like to know before I get
too involved.
Thanks!
Reply to: