I tried out the pre-installed Chess game on Mac OSX with System Preferences -> Dictation & Speech activated, and recommend you to do the same if you have a Mac - it’s a nice way to get a feel for how offline-only voice control can work in practice.
It requires a 440Mb download for offline speech recognition, and if you use that option, it sends no data to Apple (wouldn’t want to be spied on from Cupertino and end up being denied a mortgage because of how often I lose at chess, of course).
I had played the Chess game before using the laptop’s trackpad, and must say that I’ll probably play with voice control from now on, just because it allows you to lean back on the sofa with your laptop on the table while playing chess against the computer.
The speech recognition is really good enough, with two complaints:
- too many false positives for the ‘undo’ command. I tested saying ‘how are you?’, and apparently it thinks my pronunciation of ‘are you’ is very close to the sound of ‘undo’. It even seems to be the default command towards which it is biased when it thinks it is receiving a command, but no other phrases match it, because it would sometimes spontaneously undo the last move based on background noise when I wasn’t even talking (but I’m in a noisy place with music playing).
- parsing a command takes up to 6 seconds, which is long - especially if it parses the command wrongly, and you have to say ‘undo’ and wait another 6 seconds.
Memory consumption is around 500Mb, and CPU peaks at 10% while processing a command.
No wake-up word is required, but then again, it’s only listening while the Chess app is focused, so it’s not the same as a voice command device without wake-up word that’s on 24h.
Using a headset (of course) significantly improves the speech recognition performance, as well as speed, but even without a headset, and with some patience, I was able to play chess in a room where there is music playing. Then again, I guess the Macbook’s internal microphone is directional, and I was speaking towards the microphone while the music in the room was undirected.
I tried what happens when I say ‘pawn e2 to a6’ (an illegal move in chess), and it translates this to ‘pawn e2 to e3’ - as do ‘rock e2 to e3’ and ‘pawn e2 takes e3’ when there’s a pawn on square e2. Yet when you say ‘elephant e2 to e3’, nothing happens. This means it uses not only a vocabulary of chess-related words, but also uses the rules of chess to deduce what you probably meant.
Apart from spontaneous ‘undo’ when I did not say ‘undo’, the most frequent incorrect command recognition was between similar-sounding column-letters, e.g. ‘c3’, ‘d3’, and ‘e3’ - but only if the piece in question was allowed to move to the wrongly interpreted column.
Whereas commands for illegal moves in terms of piece movements simply don’t get recognised, if you tell it to move a piece to a square to which this piece could move according to that’s piece movements, but that would put me in check state, or when trying to castle (rocade) after already having moved the king back and forth, it would display that it had understood my command, but then alert that the move is illegal in chess.
All in all, based on trying out Mac OSX Chess for about an hour, my conclusion is that offline voice command is good enough for use in practice, at least if:
- the device has at least 1Gb memory
- you have a directional microphone
- you’re willing to wait 5 seconds for the command to be recognized
- you use not only a restricted vocabulary, but also domain knowledge to know which command the speaker probably intended to give.
Unlike for a chess game, which you close when you’re done playing, for a 24h always-on voice command device, a wake-up word would probably be necessary.