Email

Email addresses are notoriously difficult to collect because they may contain a virtually limitless series of letters, numbers, and punctuation (dash, dot, etc.). For this reason, speech is the only option for collecting email address. Many solutions involve agent-assisted collection, where an agent is listening and transcribing behind the scenes, in a sort of simulated automated speech recognition. Typically, email addresses are collected in two steps, which reflect their structure, as follows: Step 1) Collect the username, i.e., the part that precedes @ (the “at” sign). This is often a combination of the user's name and/or punctuation, but may be nearly any combination of alphanumeric characters. Step 2) Collect the domain name, i.e., the part that follows the @ character.

The username capture is especially complex, as it involves an unconstrained string of alphanumeric characters. People may pronounce part of it as a word followed by the name of the keyboard characters (e.g., “Jane underscore oh five one”) or spell them (e.g., “J-A-N-E underscore oh five one”) or spell them using some version of the radio alphabet (e.g., “J as in Juliet…” or “J as in John…”). This is obviously a very difficult context to be able to predict with an accurate speech grammar, and recognition is likely to be inaccurate. (See related sections on Voice Spelling and Alphanumeric Input.) While domain names are simpler in that they are much more predictable, recognizing them can be complex because new domains are continually added. If you are designing an email collection, you will need to update your grammar containing the list of pre-existing domains with new domain names as they are created. The more that is known about the email address via backend data, the easier the collection will be. For example, if your email collection is working from a limited set or known list of users, such as a corporate directory, recognition is likely to be improved, because the recognition task will cover a limited set of usernames/domains, so you can create specific grammars based on a data feed. Note that callers may still need help knowing whether they can or cannot use a radio alphabet. Examples within the initial prompt are helpful here, for instance “First I need the user name, which is the part of your email address which comes before the 'at' sign. Spell just the user name, like this: J A N E 1 9 7 5.”

Note that grammars can be constructed such that the caller may say the username and domain in one turn, but this is a much more challenging recognition context and is not recommended for most email capture using automated speech recognition. Grammars can be formulated to allow the caller to say and spell their username, but again, this increases the complexity of the grammar. Regardless of the approach taken, the application will need to provide clear instructions so that callers understand how to provide their username and domain.

When confirming email, it is recommended that common domain names be pre-recorded in advance. Often, text-to-speech is used for domains for which recordings do not exist. Be careful when confirming the username portion of the email address to model the behavior expected of the caller. For example, if your grammar does not recognize the radio alphabet, do not use it when confirming back the username. Also, this is a confirmation context where callers are very likely to attempt one-step correction (e.g., “no, it's five NINE five”). To limit the possibility of responses containing one-step correction, giving explicit instructions to say “yes” or “no” at the end of the confirmation is recommended (e.g., “Did you say “J-A-N-E”? Yes or no?”). Because of the infinite number of email addresses, a grammar designed to recognize an email address is large and complex. The size and complexity of these grammars reduce the performance of the recognizer, which, in turn, may compromise the caller experience. When creating an email collection flow, the designer should include confirmation and consider providing sophisticated back-off strategies.

As of 2015, the newest version of the Nuance email module uses caller responses to self-tune and improve its email grammar.