One of the most difficult pieces of information to capture in an IVR is a person's name. The problem of recognizing spoken names is similar to the problem of accurately producing spoken names (see TTS). “The first [problem] is the large set of names involved in many applications, ranging from a few thousand names to over a million in some cases. The second is the lack of standardized pronunciations for many names; each can have multiple valid pronunciations, which further increases the difficulty of the recognition task” (Davidson, McInnes, & Jack, 2004, p. 56).
When collecting a caller's name, use Speak and Spell
Davidson et al. (2004) had 95 people participate in an experiment in which they booked flights for themselves and a hypothetical friend. The name grammar included 11,926 British surnames – all names spoken in the experiment were in grammar. The primary experimental variable was the method for collecting the name, with three conditions – Speak Only (“Please say your surname.”); One Stage Speak and Spell (“Please say then spell your surname.”); and Two Stage Speak and Spell (“Please say your surname,” then, after collecting the spoken surname, “How do you spell that?”). The grammar allowed callers to use “double” (as in “double-B”) and in the one-step condition to use linking words such as “spelled” or “that's” (“Smith spelled S M I T H”). The table below shows their results for mean attitude rating, percentage successful task completions, and percentage preferring the method (for all variables, higher scores are better).
For efficiency, prefer one step Speak and Spell, but if there are recognition problems, consider using two steps
The data showed a clear advantage for Speak and Spell over Speak Only, but no statistically significant difference between the two Speak and Spell methods (although there was a slight completion rate and preference advantage for the One Stage version).
Collect first and last names in separate dialog steps
In a usability evaluation reported by Damper and Gladstone (2007), callers found it difficult to spell their first and last names in the same step (“William W I L L I A M Jones J O N E S”), so it is probably wise to collect first and last names in separate dialog steps.
When verifying a name, limit the number of names in the name verification grammar
Sometimes you don't need to capture a name, you just need to verify it. For name verification, there are ways to limit the number of names in the grammar to improve recognition. For example, if there is some type of authentication process before the IVR has to verify a name, the grammar can be built dynamically, using the name returned from the authentication process. If necessary, the grammar can contain other “filler” names to prevent random utterances from being falsely accepted as the real name in the grammar. There are systems in operation, however, where the only item in the grammar is the target name, with the acceptance confidence level tuned to achieve an acceptable level of accuracy.
Damper, R. I., & Gladstone, K. (2007). Experiences of usability evaluation of the IMAGINE speech-based interaction system. International Journal of Speech Technology, 9, 41–50.
Davidson, N., McInnes, F., & Jack, M. A. (2004). Usability of dialogue design strategies for automated surname capture. Speech Communication, 43, 55–70.