How We Got Here and When

From Standards…
AVIxD (The Association for Voice Interaction Design) has contemplated IVR (Interactive Voice Response) and VUI (voice user interface) design standards since the organization’s inception. Within the VUI community, there has been skepticism as to whether standards were even possible. A VUID’s (VUI designer) favorite answer has always been “it depends.”

In 2011, a group of VUIDs discussing the topic again came to a realization. What was holding us back was thinking of standards as black and white. In his recent book, “The Voice in the Machine”, Pieraccini (2012) pointed out that although the recent history of voice systems includes a proliferation of industrial standards such as VoiceXML, SRGS, SSML, CCXML and MRCP, it is unlikely that there will soon be rigidly enforced standards for VUI design in the form of a call-flow description language.

“Although a call-flow description language would seem the perfect candidate for yet another standard, despite a number of attempts, no such standard has yet appeared, at least none as of this writing. And there's no industry-wide interest in creating one: a standard for an end product at the top of the speech industry food chain would fill no gap between levels of the chain. By contrast, a middle-of-the-chain standard like MRCP fills a gap between the VoiceXML browsers built by some companies and the speech recognition and text-to-speech engines built by others. Without it, vendors would have a hard time integrating different speech recognizers and text-to-speech engines into their platforms to offer their customers choice and flexibility. And companies that built speech recognition and text-to-speech engines would have a hard time selling their products to the many different platform vendors. … Thus, there might not be a standard until the call flow becomes an intermediate representation between two levels of the speech industry and no longer a top-of-the-food-chain end product. That could happen if the flow of interaction of a dialog machine ever becomes an ingredient of a higher-level reasoning machine. But that's not the case yet” (Pieraccini, 2012, pp. 255-256).

To Guidelines…
By changing the focus from standards to guidelines, the idea becomes more universally acceptable. Additionally, assigning a relative importance to each guideline and presenting the circumstances under which the recommendation changes gives the idea of guidelines even more credence.

Various people within the community have written books on VUI design (e.g., Balentine & Morgan, 2001; Cohen, Giangola, & Balogh, 2004; Lewis, 2011), but there hasn’t been an industry-wide effort to define guidelines. A working group within AVIxD was established to do just this, and the result is this document.

Audience
The document is aimed at novice and experienced designers alike. It can be used to learn about design and help make decisions, or to validate decisions that have already been made. One of the challenges a VUID faces regularly is that their skills are undervalued. Many people think that because they speak a language, they can write IVR call flows. The skill of the designer is understanding what “it depends” on, and what solution works in what case. It is to evaluate the given situation and then come up with the right solution. Therefore, one of the goals of this document is to share not just the recommendations, but the “why” behind them. There are nearly 300 topics, which in and of itself should highlight the amount of knowledge that goes into designing a successful IVR.

Research
Wherever possible, the authors have shared research on the topics. Unfortunately, for many areas in VUI design, rigorous research that’s applicable across systems is in short supply. As a result, this document will be a living and growing one, changing as research is done. In addition, certain recommendations may change over time as the technology changes.

Authors
And who are the authors? Why should you believe what you read here? Every member of the committee has been doing design for over ten years. Many have written books or articles or white papers. All have presented at conferences. They come from a multitude of organizations and have worked in countless domains. If this group concurs, it's a safe bet that it's a good idea. And where we haven't concurred, we have put forth all sides. This is almost always in cases where observations have differed, and we don't exactly know why, so we have presented as much information as possible to help you make the decision that's right for you.

See the contributors page for bio information on each author.

References
Balentine, B., & Morgan, D. P. (2001). How to build a speech recognition application: A style guide for telephony dialogues, 2nd edition. San Ramon, CA: EIG Press.

Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.

Lewis, J. R. (2011). Practical speech user interface design.Boca Raton, FL: CRC Press, Taylor & Francis Group.

Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech.| Cambridge, MA: MIT Press.

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

How We Got Here and When