[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Seeking information about Web Speech API



I recently discovered the Web Speech API (https://wicg.github.io/speech-api/) which has apparently been implemented in both Chrome and Firefox. In fact, this appears to have been the impetus behind the Mozilla DeepSpeech project (later renamed STT, then spun off into Coqui).

The idea appears to be to allow websites to use JSpeech grammar format to design telephone tree type interfaces for voice navigation.

The current implementations seems completely broken on Firefox. There is a demo page at https://mdn.github.io/dom-examples/web-speech-api/speech-color-changer/ which currently works in Chrome, but in Firefox you have to go into about:config and enable both media.webspeech.recognition.enable and media.webspeech.recognition.force_enable to get it to work at all, and even then all you get as far as a result is "Error occurred in recognition: network". Apparently Firefox was using the Google Cloud STT service, but that appears to have been shut down in addition to the Mozilla DeepSpeech test endpoint.

There was a setting in Firefox: media.webspeech.service.endpoint, which could be used to set a specific back end server, but this appears to have been removed, so now that the default endpoint no longer works the whole api is basically useless.

On Chrome, of course, you just have to trust whatever endpoint they are using. I highly doubt they are doing STT on-device, so it must be going to a service somewhere - probably still Google Cloud STT.

Does anyone know anything about the current state of this technology? I'm working on a library using WebRTC to implement STT in the browser using AJAX to contact a back end VOSK or Whisper service, but if anyone is already working on something like this, I'd like to know before I get too involved.

Thanks!



Reply to: