Directed Dialog vs. SLM

Grammar/Language Models

Finite State Grammars

Finite state grammars (FSGs) fully specify what the system can recognize, both specific words and the orders in which the words can appear. It is possible to automatically create some parts of FSGs (e.g., a list of movie titles), but in many cases designers create the grammars. FSGs are the typical grammar technology used in directed dialog applications.

test new text.

Statistical Language Models

Statistical language models (SLMs) also require full specification of the words they can recognize, but have a statistical rather than a full specification of word order, based on analyses regarding the frequency of occurrence of individual words, pairs of words (bigrams), and to some extent triplets of words (trigrams) (Jelinek, 1997). Natural language understanding applications that include SLMs also typically include components that perform statistical action classification, statistical parsing, and dialog management (Pieraccini, 2012).

Thus, SLMs allow for more flexibility in the interpretation of what callers say, especially in response to open-ended prompts such as “How may I help you?” This flexibility, however, can come at the price of generally higher development and maintenance cost than FSGs (Balentine, 2010; Pieraccini, 2010).

Prompting Styles

There are two major prompting styles – directed dialog (e.g., “Please select checking, savings, or money market”) and open-ended (“How may I help you” or “What would you like to do”).

Because effective directed prompting makes very clear what the application can understand (Zoltan-Ford, 1991), the associated grammars (typically FSGs) can be very simple, making them easier to code and maintain (Boyce, 2008).

Taken to an extreme, however, a highly directed dialog with an overly simple grammar might be too restrictive and non-conversational (but, with careful design, can be both pleasant and effective for many tasks).

Some applications, especially those that have complicated menus to navigate to get to the menu terminals (the points in the task flow where a caller leaves the menu and gets routed to either a call center skill group or self-service application), may benefit from the use of more open-ended prompting supported by statistical language models (Byrne, 2003; Polkosky, 2005a, 2005b).

For specific guidance on the design of both kinds of prompts, see Chapter 5.

References

Balentine, B. (2010). Next-generation IVR avoids first-generation user interface mistakes. In W. Meisel (Ed.), Speech in the user interface: Lessons from experience (pp. 71–74). Victoria, Canada: TMA Associates.

Boyce, S. J. (2008). User interface design for natural language systems: From research to reality. In D. Gardner-Bonneau & H. E. Blanchard (Eds.), Human factors and voice interactive systems (2nd ed.) (pp. 43–80). New York, NY: Springer.

Byrne, B. (2003). “Conversational” isn’t always what you think it is. Speech Technology, 8(4), 16–19.

Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge, MA: MIT Press.

Pieraccini, R. (2010). Continuous automated speech tuning and the return of statistical grammars. In W. Meisel (Ed.), Speech in the user interface: Lessons from experience (pp. 255–259). Victoria, Canada: TMA Associates.

Polkosky, M. D. (2005a). Toward a social-cognitive psychology of speech technology: Affective responses to speech-based e-service. Unpublished doctoral dissertation, University of South Florida.

Polkosky, M. D. (2005b). What is speech usability, anyway? Speech Technology, 10(9), 22–25.

Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies, 34, 527–547.