How secure are speech recognition applications?
A couple of months ago I read an article in the Dutch media about Alexa, Amazon’s smart home assistant, sending a random conversation from someone in the USA to a random person in the owner’s contacts list. I found an article in English covering the same story at the Guardian.
Apparently what happened was the following: the couple in the house got a warning from an acquaintance. The acquaintance had received an email containing a conversation of the couple about wooden floors. At first, they didn’t really believe it, but the acquaintance quickly convinced them by supplying details of the conversation.
Obviously Alexa had interpreted some of the words as the command to send an email to a certain person with a certain content. All this without the owners having a clue about what was going on!
Although Alexa isn’t supposed to do anything unless someone says “Alexa” in order to wake her up, it is clear that in this case something went awry.
What is the risk with smart technology?
The nature of smart speakers like Amazon’s Echo with Alexa is that they listen to what’s going on in the room 100% of the time. After all, at any point in time someone could say “Alexa”, indicating that the smart speaker needs to take action. Unless this activation command is said, the smart speaker isn’t supposed to record any of the conversation going on in the room. However, we will just have to trust the manufacturers that this is indeed the case.
In the situation mentioned above, there was a cascade of Alexa misunderstanding words from the random conversation taking place in the room. I’m willing to believe that this is a very rare occurence. Still, it’s also a wake-up call.
Now, I am Dutch and in the Netherlands Alexa is not very popular yet. But we do have iPhones aplenty. Ans iPhones have their own smart speaker built in. Personally I have an iPhone and I have it set to react to the command “hey Siri”. I soon as my iPhone hears that command, it will activate and ask me what I want.
Although I don’t think that my iPhone ever took any action without me wanting it or being aware, it does regularly activate without my having said the magic words “hey Siri”. Especially when I am in the car and I’m listening to the radio or a podcast. Occasionally the car audio system will switch from my audio to the Siri voice assistant. Apparently some random audio fragment triggered the iPhone into thinking I said “hey Siri”. It’s not difficult to imagine that something might accidentally happen that you did not intend.
Also, whenever I use Siri to dictate something – and I do dictate with Siri – my iPhone itself is not powerful enough to process the conversion from spoken words into text. That’s why you cannot use Siri without an Internet connection. The iPhone sends the audio of the spoken words to Apple’s servers, where it is transcribed and then the text is sent back to your iPhone.
We will have to trust Apple to take good care of that data. Nevertheless, I pay attention to when and for what I use dictation on my iPhone.
What about Dragon?
It’s a logical question, isn’t it? How about Dragon? Is the speech recognition software listening to anything you say and perhaps even sending your speech data to servers on the other side of the world?
Well, the reason you need a powerful computer in order to run Dragon speech recognition software is because the audio processing takes place on your computer. And, at least on Windows computers, you can choose in the Dragon options not to share usage data with the manufacturer. So in theory you’re good on the security front.
Dragon does have a ‘state’ in which it listens to everything you say. If the microphone is in the ‘asleep’ state, the programme is constantly listening to what you say. Ater all, you might be saying “wake up” at any moment in time. But as mentioned above, audio processing takes place on your own computer without need for an Internet connection to the manufacturer.
A soon as you start using your tablet or cell phone for dictation, however, transcription takes place at a server somewhere else. I have no experience with Dragon Anywhere, since we do not have it in the 19th, but I assume that Nuance has servers to facilitate transcription.
Can you prevent the risk?
In this day and age it is very difficult to avoid being monitored by companies. The entire online security matter has become so complicated that a regular person has no clue what happens to their data.
Most of us willingly store our information on Google Drive or OneDrive and send our emails via Gmail.
Should we be worried about our dictation being transcribed on a server somewhere in another part of the world? Is there any extra risk attached?
I think everyone has to figure that out for themselves.
And with regard to smart speakers like the Echo (Alexa), I think it is justified to be cautious without becoming paranoid.
As for Dragon… Personally, I have set my Dragon options to not share any information with the manufacturer. If you want to check how it is set in your Dragon profile, read on…
How to change the Dragon option for sharing data
Chech out this screenshot of the relevant option, or watch the 1 minute video below.
Check out more practical videos in the LearnSpeechRecognition Academy