Google Speech API

Hey,

I feel this is the elephant in the room so I’m surprised we never mentioned it yet.

Google Speech API is a cloud-based API for speech recognition. Until now I assumed we would like to try to be offline, but if we accept online I think we should include this product in our investigation, even if I would not be that happy to use Google for this.

For more info: https://cloud.google.com/speech/


Julien

Hi,

I think that has been mentioned several times, but being offline was a competitive advantage.

We could even try alexa voice api:
https://developer.amazon.com/public/solutions/alexa/alexa-voice-service

And tons of online services, perhaps it’s good to try them out to have an idea on what we can have.

I’d like to see evidence of this.

So far I don’t have a strong evidence cause other similar products (google now, siri, etc.) need internet connection, or perhaps, we could think that’s already an advantage, being the first one that works offline.

But also could enumerate some:

  • Perhaps not in SF, but I can tell you that internet connection in London could bite you in the arse, with long latency periods, to cuts, to picks of high demand.
  • … that could lead you to situations like you cannot (let’s say a stupid example), open the door with a voice command while you are sitting in the sofa.
  • Or just even if the traffic with the service is encrypted, not letting the company offering that service know about your usage patterns (when you use it, what kind of stuff you use if for, etc.).

Don’t take me wrong, I’m not saying we should not try online services, but is interesting, IMO, try both, perhaps the perfect balance is in the middle. A combination of offline commands and online power commands. Who knows, but that’s what we need to try right? :wink:

Thanks!

We wouldn’t be the first. Voice commands on iPhone were available offline before Siri was announced. The expereince is sub-par. Andre may be able to add detail here.

How could we validate the need first? What experiments could we run to test this assumption?

We are talking about an assistant. Just commands were available loooong ago, do you remember Dragon Talk? I think I tried it on 1996 :slightly_smiling:[quote=“jedireza, post:5, topic:8566”]
How could we validate the need first? What experiments could we run to test this assumption?
[/quote]
I’m definitely not an expert on designing a user experiment, but again, the first thing popping on my mind, why don’t we just shut down internet connection while testing out amazon echo with an user?

Again, don’t know if it will be add killer value, or insignificant value, we will dive into the user testing process to validate it. And pretty happy to hear from other people how this can be validated or discarded :slight_smile:

Cheers,
F.

If we want to go for offline…then why not Vaani?
Also…pocketsphinx (which vaani utilizes or used to atleast…) if we only want limited voice commands

Hi !

There are various answers to “why not Vaani”, but the strongest is that
we want to keep our dfferent trains separate for now. We don’t want to
make them interdependent.

Hope this helps :slight_smile:

Long-term, we probably want to use Vaani. But the current strategy is that each train must live (or die) on its own until it has been proved. If we connect Link and Vaani too early, we need both to succeed if we wish to have any kind of impact, which makes the whole more fragile.

Accepting online does not imply accepting closed-source. :slight_smile: So IMHO that’s a no-go area for anything Mozilla ships as “part of the product’s functionality”, unless we offer the user multiple options for their speech engine, just like we offer multiple options for their search engine. Hosted speech engines may not currently be competing, swappable services in the same way search engines are, but maybe they will become more like that in the coming years.

A client only for Google Speech API is not what I would call a user agent; even Google Chrome allows setting DuckDuckGo as your search engine.

If we give the user the option to send all live sound that follows the wakeup-word to a closed-source, single-instance, hosted, untrusted speech engine, then we should at least give them 2 or 3 options to choose that speech engine from, like Firefox does with search engines.

Note I never said we should use Google Speech API, only that it should
be investigated.

1 Like

A notable difference is that if you ask Google’s speech API what the time is, it will return the text string “What is the time?” (which you can then feed into a second http request to Google search, but then you’ll get an answer back that’s too long to stream out with TTS), whereas if you ask Alexa Voice Service, it will instead directly return the text string “It’s 8pm”.

IMHO we should implement support for at least 2 hosted voice APIs, and let the user choose at runtime where their voice commands are sent to.

\o/

Yesterday I was talking to @fabrice in IRC about doing exactly that. Same thing that we do with the search engines in the browser. Allow the user to choose who handles the voice, as the voice is just a mechanism for controlling. That will definitely a differentiator.

And as well we allow the people to choose, perhaps you prefer to go with a less fancy version cause you care more about your privacy, or go for full and easy recognition trusting companies like Google or Amazon.

Looks cool! But we should be careful of having these services fighting each
other:
https://twitter.com/curiousgene/status/733002062552686592