ISV Training: Introduction to Substitutions

Substitutions are applied by the dictation server as a final step before the transcript is returned to the user.

You can use substitutions to customize how speech is returned in a dictation transcript, such as an abbreviation, an acronym, or a word with special capitalization. They can also be used to correct problems in dictation transcripts. For example, if every time you dictate the words, "in the interim" you see the words "in the ER room" in your transcript, you could use a substitution to fix that.

The spoken form of a substitution is what you say in your dictation to elicit the substitution. Therefore it needs to be a word or phrase that the dictation engine correctly transcribes when the user says it. If the word/phrase is not transcribed consistently for the user when it is dictated, the substitution won't work consistently either. (If that happens, you can try adding the word or phrase as it is commonly used to sentence modeling.)
The written form of a substitution is what you want to see in your transcript when you say the spoken form.

When a group is configured to retain dictation data, both the original transcript and the transcript with substitutions can be viewed in the administrator console on the Review & Correct page.

Substitution Types

The dictation engine supports "literal" substitutions as well as "regular expression" substitutions. Regular expressions (RegEx) search for patterns in a transcription and standardize how text is returned, for example to format dates, telephone numbers, or monetary amounts.

There are three types of substitutions:

Literal Substitutions
RegEx Substitutions
RegEx Plus Substitutions

Literal Substitutions

A literal substitution replaces a literal sequence of text characters (or "string") with another literal text string.

Literal substitutions are full word matches. This allows the word "cat’" to be substituted with "FELINE" while preventing the word "catalog" from being substituted with "FELINEalog".
A literal substitution is applied exactly as it appears in the Spoken Form and Written Form fields.
The spoken form is the word or phrase as it IS CURRENTLY RETURNED by the dictation engine. This is not case sensitive. The written form is how you WANT IT TO BE TRANSCRIBED in your transcript.
Use literal substitutions to replace words in a transcript with an abbreviation, an acronym, capitalization, or one or more words that consistently return incorrectly.

Example:
A literal substitution to replace "nVoq Voice" with "nVoq Voice™"

Spoken Form:
nVoq Voice

Written Form:
nVoq Voice™

RegEx Substitutions

A Regular Expression or "RegEx" substitution defines text matching a particular sequence of characters (or "string") and replaces it with a RegEx "backreference", a literal text string, or both.

RegEx substitutions are NOT full word matches unless the expressions are explicitly defined with boundary markers.
A RegEx is a search pattern that is used to identify character patterns in a transcript and modify how they are returned, such as how telephone numbers or dates display.
The spoken form of a regular expression substitution is a search pattern which, by default, is case insensitive. (Additional rules must be specified for the spoken form to be case sensitive.) The written form could be literal text or RegEx backreferences.
RegEx substitutions are implemented with the java.util.regex.Pattern class.
The JavaDoc for this class can be accessed at https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
See the section titled Summary of regular-expression constructs.
nVoq Administrator performs minimal error checking for RegEx syntax.
It is strongly recommended that you test your substitutions extensively to ensure they perform as expected.

Example:
A RegEx substitution to convert the phrase "next line" to a single carriage return

Spoken Form:
(\A|\n|[.:?!]|[ ])((next|necks) line)[ ](k)

Written Form: $1 K

RegEx Plus Substitutions

A RegEx Plus substitution uses JavaScript to reference variables.

RegEx Plus substitutions are NOT full word matches unless the expressions are explicitly defined with boundary markers.
This type of substitution uses a RegEx search pattern for the spoken form. When a match is found, the JavaScript programmatically determines the text to substitute.
The spoken form of a RegEx Plus substitution is a regular expression. The written form is a JavaScript.
RegEx Plus substitutions may not work as expected in STABLETEXT (Direct Text) mode. (More on that later.)
The JavaScript written form can reference the variables defined below.

In the context of RegEx Plus substitutions, browser elements and functions (e.g., window.* variables) and web page elements (e.g., document.* variables) are undefined. In other words, you cannot use a RegEx Plus substitution to launch a web page. Use a Shortcut instead.

Example:
A RegEx Plus substitution to replace "dot", "full stop", "PD", "pleaded", etc. with a period ( . ) instead of those words.

Written Form:
"\. " + match.group(3) + match.group(4).toUpperCase()

Supported Variables for Real-time and Batch Processing

User-defined variables are supported for dictations submitted with nVoq.API batch processing.

Variable: nvoq.fullText
Definition: The entire transcript.
JavaScript Type: String

Variable: nvoq.profile
Definition: The profile used to generate the transcript
JavaScript Type: String

Variable: match.group(N)
Definition: The contents of the Nth capturing group in the RegEx (if any)
JavaScript Type: String

Variable: match.start(N)
Definition: The start index into the Nth capturing group in the RegEx (if any)
JavaScript Type: Number

Variable: match.end(N)
Definition: The end index into the Nth capturing group in the RegEx (if any)
JavaScript Type: Number

Variable: match.groupCount()
Definition: The number of capturing groups declared in the RegEx
JavaScript Type: Number

Order in which Substitutions are Applied to a Transcript

Literal substitutions are applied to a dictation transcript first, in matching longest to shortest string order. (If the same length, they are compared based on the Unicode value of each character in the strings.) Then RegEx and RegEx Plus substitutions are applied next, but in an indeterminate order. A given string in a transcript can only be replaced once by any single substitution. However, the substituted result in the transcript is subject to subsequent substitutions.

Substitutions behave differently when STABLETEXT is enabled versus HYPOTHESISTEXT because of fundamental differences in the way STABLETEXT works.

Using Substitutions in HYPOTHESISTEXT Mode vs. STABLETEXT Mode

Because substitutions are applied differently in STABLETEXT and HYPOTHESISTEXT, it is possible to write substitutions that work fine in HYPOTHESISTEXT mode, but that don't work as desired in STABLETEXT mode.

HYPOTHESISTEXT - Substitutions are applied at the end, when the dictation is complete. In HYPOTHESISTEXT mode, substitutions have access to the entire transcript and are applied to the entire text of the transcript. Because of this, they can do things like modify the beginning of the transcript based on something that might appear at the end.

STABLETEXT - Substitutions are applied to the current section of text being processed and transcribed when there is a pause in the dictation. The fundamental difference here is that in STABLETEXT (Direct Text) mode, substitutions only see one chunk of text at a time, and they cannot see what comes before or after it. Post-processing RegEx substitutions don’t have access to ALL of the text in the dictation. So something like modifying the beginning of a transcript based on the end of the transcript is impossible. This means that substitutions are more powerful in HYPOTHESISTEXT mode and can do things that are not possible in STABLETEXT mode.

Optional Hands-On Exercise: Add a Substitution in nVoq Administrator

In this exercise you're going to add a substitution for the Zamboni customers at the division level.

In the Organization menu at the top of the page select YOUR Zamboni Clinic DIVISION-level organization.
Go to the Substitutions page.
Click the new icon (plus symbol) in the blue bar at the top of the page.
From the Substitution Type menu, select Literal.
In the Spoken field, type Denver health
In the Written field, type Denver Health Medical Center
Click the Save button.

Substitution APIs

Please see the following resources for substitutions using our API: