Lists

What is a list?
The distinction between a list and a menu is rather subtle and somewhat open to interpretation. They are both a presentation of choices. The biggest difference is simply length; at some point, a menu becomes a list due to number of options. At the very least, you have to tackle design differences when the number of choices grows beyond the DTMF digits available for DTMF backup.

There are other factors that may help distinguish between the two, especially when there isn't necessarily a huge number of options. For a menu we often think of actions, whereas a list is a group of similar things. A list may also contain items that are determined dynamically. This dynamic nature means concatenated prompts rather than a single prompt.

As an example, suppose a caller is rebalancing funds in their 401K. (Never mind that nobody in their right mind would do this over the phone instead of the web; such systems have indeed been built.) The list of available funds could be long, could vary over time, and could vary by plan. The prompt cannot be pre-recorded as a single message.

Another wrinkle to dynamically generated lists is that if the items change a lot, there may not be well-tuned grammars to support each one. Also, options may be lengthy or difficult for callers to say. Although the system can be built to put the items from the list into a grammar on the fly, it probably won't have great synonym coverage, custom pronunciations, or have been subjected to the rigors of making sure the items are dissimilar enough to not lead to problems. In this case, asking the caller to speak the name probably doesn't leave us with warm fuzzies about their chances at success.

Here's another example. In an application to find hotel information, suppose the caller says the hotel is in New York, and when prompted for the hotel itself, says “The Marriott.” Although Marriott has indeed given each of their hotels a distinct name, travelers are often not familiar with them. The application has to somehow figure out which Marriott, and the best solution may be to present a list to the caller.

Before going any further, let's be really explicit. This structure is not fun to design or use. It's not a first choice. But when it's unavoidable for any of the reasons explored above, this is how to make the best of it.

Building a list
Let's return to the hotel example. For any given city/brand combination there could be zero, one, a few, or a dozen or more to choose from. Concentrating on the situation where there are lot, how to present them?

You start off by warning the caller. Ideally you tell them exactly how many choices you're about to present. “I found 17 possible matches.” At worst, let them know there are a lot. “I found quite a few matches.”

Next, think about how you're going to order the list. For the hotels, maybe you have data elsewhere that tells you how many bookings or rooms each has, therefore being able to move the most likely to the top of the list. For the 401K funds, you might have information on dollars invested or number of participants. The ranking data may be dynamically available, or you might do some static ranking. In the latter case, it would have to be revisited regularly.

For the actual recordings and concatenation, the pauses between the items need to be significant, more so than in a pre-recorded prompt. The exact time will depend on input mode, if you're using a caller response of “that one,” how long each item is (meaning, how long it takes the caller to process what was said), how distinct they are, and how clear they are. A good starting place would be 750 ms, but tuning will be necessary to adjust it and avoid turn-taking issues.

Prompting the caller
Tell the caller exactly how to choose. Repeat the name of the item? DTMF mapping? Or a key phrase like “that one?” All might be appropriate in different situations. For the hotels, names could be very tricky. The official distinctive names are somewhat clunky and phonetically similar. DTMF would have to use two digits in a lot of cases. And “that one” has its own challenges.

If you use DTMF and the list is longer than 9, then you have to decide if you're going to instruct the caller to put the leading zero or wait until the incomplete timeout level is reached. Regardless of whether you tell them that the Marriott Marquis is “one” or “zero one”, make sure that both are accepted by your grammar.

In addition to telling them how to choose, explicitly tell them they can interrupt, especially if you're prompting for speech. Elsewhere we've stated do NOT do this at the beginning of a call. In general, most people know they can interrupt. But for those that don't or aren't comfortable, the system doesn't go on so terribly long at any place as to make it a problem. A 17-item list is an exception. Go ahead and remind them. So you might have something like one of these:

. * System: There are 12 funds you can invest in. Enter the number of the one you want.

Navigation
With long lists, we want to make life easy on the caller. We want them to be able to navigate within the list, which would include backing up to be able to listen again. Moving within the list is very tricky. It's difficult to give instructions about commands without causing cognitive overload. It's difficult to present it as just-in-time instruction. But for a really long list, we may be unwilling to make them wait until the very end to start over. Then there's the question of how far do you back up. A single item? Which item are we even on? For example, suppose the system reads the third item and then recognizes “back up.” It's already sitting on the fourth item of the list, so the user is probably meaning go back to the second one. The timing is very tricky. Lists presenting items which occur in chronological order (such as recent bank transactions) seem to be the best suited for “back up” commands, along with “next” type commands for skipping ahead.

All in all, you should avoid trying to build in a back-up feature unless data, testing, and usage show that it's really needed.

Another approach for lists is to give chunks of choices. Say there are 12 matches. Present them in groups of four.

This is harder to build, but may be more effective. It will be slower, but perceived efficiency could be better due to keeping the cognitive load under control.

A “that one” technique is attractive because it eliminates the multi-digit DTMF problem and also circumvents grammar and tuning issues. But it has its challenges as well. The biggest is the same situation described above with backing up. By the time the system recognizes “that one,” it's already on the next item. Timing is everything. Build the system in such a way that gives you reasonable confidence you have the right item. Either putting some kind of stop in that keeps the system from advancing, recognizing if the next prompt has just barely begun or has been nearly fully played, or whatever your architecture allows you. Then, know full well that you'll get it wrong sometimes. Confirm the choice. And if the caller says no, then confirm the previous item in the list. The vast majority of the time that will do the trick.

Testing and tuning are of even more importance for lists than in other areas. Lists are just plain tricky. It will take a considerable amount of design and development effort to get it right.