Action

Use action verbs
This is especially true when writing menu prompts. The best menus often use action verbs to describe the option. This eliminates the needs to expand on what the menu option represents, which can lead to lengthy prompts. Consider this prompt: “You can say, ‘Make a payment’, ‘Change my address’, or ‘Switch Accounts’.” Each option speaks for itself, and this is made easier using action verbs. Without action verbs, the prompt becomes lengthy: “To make a payment on your account, say, ‘Payments’. If you’d like to change you address, say, ‘Address’, or to work with another account, say ‘Accounts’.”

Generally use active voice rather than passive, but there are some important exceptions
In English sentences in passive voice (1) start with the object of the verb rather than the subject and (2) have some form of the verb “be” (or in some cases, “get”) before the main verb. If the subject of the verb appears at all, it does so in a “by” prepositional phrase after the verb. Compare:

  • You made a payment (active voice)
  • A payment was made by you (passive voice with subject in prepositional phrase)
  • A payment was made (passive voice with no subject)

Why should we generally prefer active voice? Look around and you'll notice that the more scholarly and formal a publication is, the more loaded with passive voice it tends to be. Passive voice can lead to wordier, weaker writing—passive sentences rewritten as active can be as much as 40% shorter. Passive voice is vague, especially when the subject of the sentence doesn’t appear at all—so the writing sounds evasive, avoiding responsibility. Overuse of passive voice can cause readers to lose interest. Readers prefer documentation with reduced use of passive voice (Lewis, 2006). Furthermore, there is scientific evidence that supports the commonly given advice against the use of passive voice. Research in psycholinguistics and human factors has consistently shown that it is harder for people to extract the meaning from a passive sentence relative to its active counterpart (Broadbent, 1977; Ferreira, 2003; Garrett, 1990; Miller, 1962)—possibly taking 25% longer to understand a sentence expressed in passive voice (Bailey, 1989). Listeners appear to use different brain pathways when processing active and passive sentences—patients with Broca’s aphasia (due to a specific type of damage to the left hemisphere) can accurately interpret active sentences but cannot accurately interpret passive (Zurif, 1990; Berndt, Mitchum, Burton, & Haendiges, 2004). Thus, for instructional or informational messages (or technical writing in general), the best way to convey the information is with active voice.

There are, however, some exceptions to this rule, especially when designing conversational and/or customer service dialogs. Conversation does not take place in a vacuum. There must be at least two participants, and as soon as you have two entities talking to one another, you have a social situation. Suppose two people are working together to solve a puzzle. The way they speak to one another will differ if they are parent and child, siblings, close friends, distant acquaintances, co-workers, or worker and manager. The various ways in which we address one another reflect our social relationships. Slang and jargon can establish who is in and who is out of different social groups (Fromkin, Rodman, & Hyams, 1998). Another aspect of social consideration in conversation is the directness of a request. Consider the following:

  • Pour me a cup of coffee.
  • Please pour me a cup of coffee.
  • Would you please pour me a cup of coffee?
  • Is there any more coffee?

All are requests for a cup of coffee, but they differ in directness, and consequently in the politeness of the request. The appropriate form for the request depends partly on how much the requester wants the coffee, and more on the social relationship between the participants. Ask rudely from a position of little social power and you risk direct refusal with accompanying loss of face (Clark, 1996), and the loss of coffee. Ask too indirectly and you risk misinterpretation of the request and loss of coffee, but avoid loss of face.

Getting this aspect of tone correct plays an important role in creating a satisfactory interaction between a customer and a service provider (Polkosky, 2006). Less direct (more deferential) requests from the service provider imply greater choice on the part of the customer, which increases the customer’s satisfaction with the service (Yagil, 2001). Customers might not be able to articulate why they perceive or fail to perceive an appropriate level of respect from a service provider, but they have the very human capacity to detect an inappropriate tone and have a corresponding negative emotional reaction. Common politeness markers include phrases such as “could you” and “would you mind” (Ervin-Tripp, 1993). Even though the syntactic form these expressions take is that of a yes/no question (“Would you please pour me a cup of coffee?”; “Is there any more coffee?”), the clear implication is that these are requests. Only someone who is refusing to cooperate in the conversation, possibly as an indication of anger or an attempt at humor, would respond with a simple “yes” or “no” rather than just smiling and pouring the coffee. Passive voice provides another way to reduce the directness of requests.

Some examples of appropriate use of passive voice are:

  • Focus: To put the focus on the object of the sentence – That car was parked by John.
  • Continuity (end-focus principle): To achieve a smooth connection between the end of one sentence and the beginning of the next, especially in dialog (Cohen, Giangola, & Balogh, 2004) – A: Did John park that car? – B: No, this car was parked by John. – But note that you can often achieve end focus more efficiently without passive voice by using ellipsis – A: Did John park that car? – B: No, this one.
  • Scientific writing: To avoid the use of personal pronouns (“I,” “we”) in scientific or other formal writing. Note that this practice has been changing, especially in human factors and psychology – The expected effect was not found.
  • Common construction: Although Mrs. Smith did the work, we would normally say – John Smith was born on January 5, 1984.
  • Obscure responsibility: To deliberately avoid indicating the responsible party – Some coffee should be brewed. or, more famously, Mistakes were made.
  • Appropriate tone for service provider: To politely inform a customer of something they must do – Check-in must be completed 30 minutes before the flight.

As shown above, there are two general situations in which interaction designers should consider passive over active voice – structural and social. There is a structural rationale for focus, continuity, and in traditional scientific writing, to emphasize objects of verbs rather than subjects. In contrast to a structural rationale, the rationale for obscuring responsibility is more social. In addition to providing a way to make indirect (polite) requests, passive voice allows speakers to bring up potentially touchy topics in a relatively polite way, without explicitly identifying the responsible party. Voice interaction designers should avoid indiscriminate use of passive voice but should not fear to use it when it is the more appropriate choice.

Instructions

Write instructions in the affirmative
It's never a good idea to prompt the caller for what they don’t want. Confirmation is the best example for this. Consider this prompt: “Would you like to make a same-day payment?” Caller says “No.” The wrong thing to do is confirm with, “You don’t want to make a same-day payment. Is that right?” This is confusing to the caller. Would a “Yes” response mean that I do want to make a same-day payment or that the system got it right – I don’t want to make a same-day payment? Keep in mind, most of the time it’s best not to confirm yes/no responses and this was used strictly as an example.

Another example to consider is collecting information from the caller. Let’s assume the system needs a birth date. The initial prompt is “What’s your birth date?” Caller provides the date in a way the system doesn’t understand, so in a retry prompt, the system instructs the caller to give the date a certain way. “Say your birth date like this, May 27, 1980.” The system would never tell the caller what not to say: “Say your birth date, but please not like this, 0-5-2-7-8-0.”

Oftentimes what not to do is simply extraneous information. In this prompt: “Give us just the last four digits of your social, not your whole social security number,” the second half of the sentence doesn't help anything at all and confuses the caller as to how much they're supposed to give, especially if their attention is divided.

It's spoken language, not written

Write like people talk
Think screenplay or script, not paper or publication. Remember that your IVR prompts will be heard rather than read, so if a prompt sounds natural when spoken, its construction is appropriate for a telephone application. This is the case even when the written form seems too casual. It is much more important for the application to be comprehensible and provide instructions which are easy to follow, even if the wording is not what would be appropriate for written content.

Because of this, make every effort to present your design to the client orally, at least in part. Record sample conversations, or even review live, with you actually reading the prompts the way you expect them to be recorded. It is generally a bad idea to give the client the flow or prompt listing without some kind of oral presentation first. The recorded conversations are especially helpful in showing how the conversational style flows nicely.

Let go of formal grammar rules
When writing speech applications, you may need to reconsider your preconceptions regarding “good” and “bad” grammar. Phrases should be grammatical in the sense that they sound natural and not marked to any native speaker of the target language population. Adhering strictly to the rules for written grammar is not recommended, as it you will end up with an awkward-sounding application.

You can expect to use grammatical constructions which you wouldn't use when writing content which will be read. Remember that written and spoken standards of grammaticality may be very different, and that “grammar rules” learned in school may not be appropriate for a casual and conversational prompt which is intended to be heard. Something that looks odd when written, such as a subject and verb not agreeing in number (e.g., “a customer may take their product into the store”) sounds fine to most native speakers of English. In fact, the ultra-correct (prescriptively grammatical) version of that sentence might sound a bit odd (and even offensive) to some listeners: “a customer may take his product into the store”. Of course, if the grammar is noticeably non-standard, meaning that it would sound uneducated to a most native speaker of English (an extreme example would be the use of “ain’t”), it should be avoided.

Use research to back you up

Sometimes rules aren't rules at all, and you can use research and documentation to help you make your case. The Oxford Dictionaries calls ending a sentence with a preposition Grammar Myth #1. Many (most) people believe it's ungrammatical, but in many cases it's impossible to write it otherwise without sounding extremely stilted. And for good reason. It's not an actual rule. An example:

  • What type of transaction would you like to search for? Say 'Charges' or 'Credits'.
  • For what type of transaction would you like to search? Say 'Charges' or 'Credits'.

The latter sounds ridiculous and obviously violates our rule of “write like people talk.”

Use contractions

Contractions are commonplace when it comes to spoken dialog. Sometimes they are avoided in technical documents, proposals, etc. because they read as unprofessional. The truth is at the end of the day, contractions are conversational. A good prompt is conversational; thus, using contractions makes prompts sound more natural. Consider these examples:

  • I am having problems finding that account
  • I cannot find that account
  • I would like to transfer you.

Compare those to the contraction counterparts:

  • I’m having problems finding that account
  • I can’t find that account
  • I’d like to transfer you

The latter set is much more natural and conversational.

How to order information

Put new information last
English in general uses the end-focus principle, meaning putting the new information last, regardless of interface. That said, speech interfaces are temporal in nature, meaning there is no persistence as with a web page. Once the prompt is heard, the caller has to remember what was spoken and react. Putting new information last is key to helping the caller recall what she just heard. Consider these prompts:

  • Press 1 to hear recent transactions.
  • Say 'account history' to hear recent credits and charges.

As a caller, one knows why she is calling. In these examples, it's to hear recent transactions. That bit of information is not new. What is new is what the caller needs to press or say in order to get what she wants. It's best to flip these prompts around, putting the new information last:

  • To hear recent transactions, press 1.
  • To hear recent credits and charges, say “Account History.”

Another example of putting new information last is when playing informational messages, like FAQs. Consider this prompt:

  • Please complete Form 123ABC to renew your passport if you're 14 years old or younger. Please complete form 456DEF to renew your passport if you're older than 14.

The “new” information is which form to complete for passport renewal. By the time the caller hears the age requirements, she has forgotten which form to complete. It's better to say:

  • If you're 14 years old or younger, please complete Form 123ABC to renew your passport. Or, if you're older than 14, please complete form 456DEF for passport renewal.

References

Bailey, R. W. (1989). Human performance engineering: Using human factors/ergonomics to achieve computer system usability. Englewood Cliffs, NJ: Prentice-Hall.

Berndt, R. S., Mitchum, C., Burton, M., & Haendiges, A. (2004). Comprehension of reversible sentences in aphasia: The effects of verb meaning. Cognitive Neuropsychology, 21, 229–245.

Broadbent, D. E. (1977). Language and ergonomics. Applied Ergonomics, 8, 15–18.

Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press.

Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.

Ervin-Tripp, S. (1993). Conversational discourse. In J. B. Gleason & N. B. Ratner (Eds.), Psycholinguistics (pp. 238–270). Fort Worth, TX: Harcourt Brace Jovanovich.

Ferreira, F. (2003). The misinterpretation of noncanonical sentences. Cognitive Psychology, 47, 164–203.

Fromkin, V., Rodman, R., & Hyams, N. (1998). An introduction to language (6th ed.). Fort Worth, TX: Harcourt Brace Jovanovich.

Garrett, M. F. (1990). Sentence processing. In D. N. Osherson and H. Lasnik (Eds.), Language: An invitation to cognitive science (pp. 133–176). Cambridge, MA: MIT Press.

Lewis, J. R. (2006). Effectiveness of various automated readability measures for the competitive evaluation of user documentation. In Proceedings of the Human Factors and Ergonomics Society 50th annual meeting (pp. 624–628). Santa Monica, CA: Human Factors and Ergonomics Society.

Miller, G. A. (1962). Some psychological studies of grammar. American Psychologist, 17, 748–762.

Polkosky, M. D. (2006). Respect: It’s not what you say, it’s how you say it. Speech Technology, 11(5), 16–21.

Yagil, D. (2001). Ingratiation and assertiveness in the service provider-customer dyad. Journal of Service Research, 3(4), 345–353.

Zurif, E. B. (1990). Language and the brain. In D. N. Osherson & H. Lasnik (Eds.), Language: An invitation to cognitive science (pp. 177–198). Cambridge, MA: MIT Press.