Recently, Brad asked a question in the comments of one of my blog posts.
“How does Dragon deal with homophones?”
That’s a great question and in fact the speech recognition sometimes has difficulty making the correct decision.
What are homophones?
Homophones are words that have different spellings and different meanings, but that sound exactly alike.
Let me list some examples:
- iron – ion
- two – too – to
- jeans – genes
- suite – sweet
And then there are also the near-homophones, of which I can think of the following examples:
- refuse – refuge
- device – devise
- called – cold
Why can homophones be problematic for Dragon?
When you are dictating into your microphone, Dragon writes the words in your document.
In order to write down the correct words, Dragon relies a lot on the pronunciation of the words.
If words have exactly the same pronunciation, like ‘jeans’ and ‘genes’, it becomes much more difficult for Dragon to find the correct words to write down.
Dragon doesn’t understand the meaning of words
For people, it usually is not a problem to correctly understand words that sound the same. We immediately come to the correct conclusion in our brains. That is because we understand the meaning of the words that are spoken.
Nonsense meanings are immediately discarded by our brain. We usually are not even aware of it. Nobody will think twice about what it means if someone says “I want an ice cream too”. That is because “I want an ice cream TWO” really doesn’t make much sense.
Nevertheless, occasionally similar sounding words can cause misunderstandings and communication between people. But not nearly as often as it does in communication between humans and dragons.
Still, Dragon often gets it right!
Fortunately, Dragon doesn’t rely only on pronunciation to make the correct decisions. It also looks at the context, the words that surround the problematic homonym/homophone.
For example Dragon is unlikely to write down “he was wearing blue genes”.
Why? Well, Dragon has analysed millions of pages of English text. And it probably never came across the word combination “blue genes”. The same sounding “blue jeans” on the other hand, likely was a word combination that did occur in the texts that Dragon has analysed.
So even though Dragon does not understand the meaning of the words “genes” and “jeans”, it should still assign a higher likelihood to the word combination “blue jeans” than to “blue genes”.
And probability and likelyhood is what Dragon uses to make decisions on what to write down. It is all about statistics. Any sounds that enter the microphon are analysed by Dragon and the program will then calculate the most probable result.
Artificial Intelligence in Dragon speech recognition
Nowadays, Dragon uses ‘deep learning’ technology, which is a type of artificial intelligence. However, I have not found any indication that this means that speech recognition software actively uses the meaning of words in order to come to its decisions on what to write down.
The artificial intelligence that is built-in in the last versions of Dragon mainly seems to improve recognition of the end users speech patterns. And it also does a quick analysis of audio environment when you start to dictate.
If you want to learn more about AI in the latest versions of Dragon Professional Individual and Dragon Professional Groep, you can read this article that was published on the topic by Nuance itself.
Dictation example mini video
Watch the Dragon in action!