Dictation: Getting the Most out of Speech Recognition
  • 3 Minutes to read
  • Dark

Dictation: Getting the Most out of Speech Recognition

  • Dark

Article summary

The speech recognizer uses a combination of the sounds you make when speaking (acoustics) and the likely spoken sequences of words (language) to figure out what you are saying. They work together to produce a transcript. As you speak, the recognizer aligns the sounds it hears to the words you are likely to say. In situations where the sounds disagree with likely word choices, preference may be given to sounds or language, depending on the specific situation. For best results, follow these tips when dictating: 

1. Speak in complete sentences with typical structure.

In some ways, speech recognition tries to predict the next word in your sentence based on words that surround it. This can be difficult, since the word sequence that may be dictated may be different then all the word sequences the recognizer has seen before. Therefore, speaking in complete sentences using expected speech patterns will help to produce the best results.

Important, word order is.

The recognizer is looking for a familiar sentence structure and words that are likely to be included in that structure. If you structure a sentence normally (for example, "Her ear was giving her pain."), you are more likely to get an accurate result than if you have an oddly structured sentence (like, "Ear was felt to be giving pain"). Uncommon sequences of words are less likely to return correctly because the recognizer must rely primarily on the sounds to produce the transcript.  Likewise, complete sentences work better than sentence fragments, and speaking slowly with long pauses in the middle of sentences (or worse, pauses in the middle of individual words) make it harder for the recognizer to make decisions about what you're saying. For best results, use normal sentence structure and word order. 

2. Speak your words clearly and say what you mean.

The recognizer tries to map the sounds in your recording to individual sounds in the English language based on many examples it has heard in the past. When you dictate, one of the most simple and effective things you can do is speak your words clearly. Accurate pronunciation will greatly improve transcript accuracy.

 Sometimes the recognizer will return exactly what you said...even if it wasn't what you meant to say.

At its most basic, the recognizer takes the sounds it hears you say and returns words that match. Unfortunately, it's common for humans to fumble over words or have mispronunciations. It's also typical for word endings, like "ed" or "s" and certain letters, like "r," "t," and "h," to be dropped from speech and not be pronounced at all. Speech is often rushed, and words run together. If you dictate a word that sounds exactly like another word (even if it's not what you meant to say), the recognizer might return the word you actually said, even if it doesn't seem to make sense contextually in the sentence. An administrator with access to the Administrator console can use Review & Correct to listen to the speech that generated an unexpected word. While the recognizer will make an occasional mistake, listening to the recording often sheds light on what was actually said and why the specific word was returned. 

3. Generate high quality recordings.

Even if you pronounce your words clearly in complete, well-structured sentences, your dictation must still be recorded and sent to the recognizer as an audio file, and the quality of that audio file plays a role in dictation accuracy as well. Dictation will work best with an audio device that has been tested and approved by nVoq. These devices have noise canceling capabilities and should produce a clear recording.  

 Poor audio can lead to poor dictation results.

If you're recording with a non-approved microphone, or have the microphone too close to your mouth, you are more likely to get pops and hisses in your recording when you say things that include p's and s's. Energy from these sounds can cause a spike in the recording, which in turn can cause errors in your transcript. Similarly, background noise can also affect accuracy. Occasional or unexpected background noise in a dictation that otherwise has no background noise in the majority of the recording may be mistaken for a word. Excessive background noise, like loud music playing, may also affect results. For best results, position the microphone appropriately and don't dictate with competing sounds in the background.