meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
usability_testing [2018/08/21 11:31]
127.0.0.1 external edit
usability_testing [2019/08/08 14:07] (current)
lisa.illgen_concentrix.com Added Anchor Links
Line 2: Line 2:
 Usability Testing (UT) is the process of watching and listening to "​real"​ people use an application in likely or realistic scenarios. In contrast to Usability Assessments (inspection methods such as heuristic or expert evaluation),​ UT reduces subjective opinion (both experts and client management) by focusing on how people actually behave when interacting with an application. Usability Testing offers methodological controls that allow us to compare different groups of users or test competing design alternatives in a rigorous, but feasible way. Usability Testing (UT) is the process of watching and listening to "​real"​ people use an application in likely or realistic scenarios. In contrast to Usability Assessments (inspection methods such as heuristic or expert evaluation),​ UT reduces subjective opinion (both experts and client management) by focusing on how people actually behave when interacting with an application. Usability Testing offers methodological controls that allow us to compare different groups of users or test competing design alternatives in a rigorous, but feasible way.
  
-Usability Testing of speech applications follows the same general philosophy and methods as UT for other applications,​ but some differences exist. One way in which UT of speech applications differs from other applications is the inability to use the [[ https://​en.wikipedia.org/​wiki/​Think_aloud_protocol| Think-Aloud]] (TA) methodology (also see Lewis, 2012). You can't ask participants to say what they'​re thinking as they use a speech recognition system. Instead, the test facilitator can interview the participant immediately following each interaction to obtain their reactions.+Usability Testing of speech applications follows the same general philosophy and methods as UT for other applications,​ but some differences exist. One way in which UT of speech applications differs from other applications is the inability to use the [[ https://​en.wikipedia.org/​wiki/​Think_aloud_protocol| Think-Aloud]] (TA) methodology (also see [[references#​lewis2012|Lewis, 2012]]). You can't ask participants to say what they'​re thinking as they use a speech recognition system. Instead, the test facilitator can interview the participant immediately following each interaction to obtain their reactions.
  
 The success of UT depends primarily on three factors: 1) how well the test participants represent the background knowledge, attitudes, and situations of the people who will use the live system; 2)how well the test scenarios simulate realistic situations and provide participants with believable reasons for making calls; and 3) the degree to which the the system being used replicates the behavior of the production application. ​ The success of UT depends primarily on three factors: 1) how well the test participants represent the background knowledge, attitudes, and situations of the people who will use the live system; 2)how well the test scenarios simulate realistic situations and provide participants with believable reasons for making calls; and 3) the degree to which the the system being used replicates the behavior of the production application. ​
  
 Usability test may be conducted using speech application at many different stages throughout design and development of the application. Testing with a fully functional application generally must occur after development is complete, and Usability test may be conducted using speech application at many different stages throughout design and development of the application. Testing with a fully functional application generally must occur after development is complete, and
-provides the most grounded, realistic data, but it may be delivered too late in the project to be maximally useful. Testing with a less-functional,​ less realistic application can often happen earlier, but because users are interacting with a system that is not identical with the production application,​ the data are not as robust. One specific early UT method used for speech applications is known as "​Wizard of Oz" (WOZ) testing, which can be conducted before the real system is completed. WOZ testing is particularly valuable when there are questions about how the target audience will interact (e.g., Sadowski & Lewis, 2001), but has some limitations relative to testing with a working prototype or deployed system, notably weakness in detecting problems with recognition,​ audio quality, and turn-taking or other timing issues (Sadowski, 2001).+provides the most grounded, realistic data, but it may be delivered too late in the project to be maximally useful. Testing with a less-functional,​ less realistic application can often happen earlier, but because users are interacting with a system that is not identical with the production application,​ the data are not as robust. One specific early UT method used for speech applications is known as "​Wizard of Oz" (WOZ) testing, which can be conducted before the real system is completed. WOZ testing is particularly valuable when there are questions about how the target audience will interact (e.g., ​[[references#​sadowskil2001|Sadowski & Lewis, 2001]]), but has some limitations relative to testing with a working prototype or deployed system, notably weakness in detecting problems with recognition,​ audio quality, and turn-taking or other timing issues ([[references#​sadowski2001|Sadowski, 2001]]).
  
-A typical usability test for a single user population requires two days of testing, with six participants each day taking up to an hour each. This is not a hard-and-fast rule -- sessions may be longer or shorter as required, and distributed over more days, especially if there are multiple distinct user groups who must be included in the test . There are statistical methods for estimating and validating sample sizes for these types of formative usability studies -- for a review, see Chapter 7 of Sauro and Lewis (2012) or Lewis (2012, pp. 1292-1297).+A typical usability test for a single user population requires two days of testing, with six participants each day taking up to an hour each. This is not a hard-and-fast rule -- sessions may be longer or shorter as required, and distributed over more days, especially if there are multiple distinct user groups who must be included in the test . There are statistical methods for estimating and validating sample sizes for these types of formative usability studies -- for a review, see Chapter 7 of [[references#​sauro2012|Sauro and Lewis]] (2012) or [[references#​lewis2012|Lewis]] (2012, pp. 1292-1297).
  
 There are several costs involved. There are several costs involved.
Line 23: Line 23:
 The basic deliverable from UT is a written list of specific recommendations based upon observations made during testing. Typically there will be recommendations for changes to the design of the application,​ and for tuning of recognition grammars. There may also be broader recommendations for changes to client procedures for serving customers so that the total customer experience of the client company is a positive and profitable one. The basic deliverable from UT is a written list of specific recommendations based upon observations made during testing. Typically there will be recommendations for changes to the design of the application,​ and for tuning of recognition grammars. There may also be broader recommendations for changes to client procedures for serving customers so that the total customer experience of the client company is a positive and profitable one.
  
-Secondary deliverables can include quantitative usability metrics such as task completion times, task completion rates, and satisfaction metrics (Sauro & Lewis, 2012). Chapter 6 of Sauro and Lewis (2012) provides comprehensive guidance on determining how many participants to evaluate in this type of formative usability test (also see Lewis, 2012). For a published example of a usability evaluation of a speech recognition IVR, see Lewis (2008).+Secondary deliverables can include quantitative usability metrics such as task completion times, task completion rates, and satisfaction metrics ([[references#​sauro2012|Sauro & Lewis, 2012]]). Chapter 6 of [[references#​sauro2012|Sauro and Lewis]] (2012) provides comprehensive guidance on determining how many participants to evaluate in this type of formative usability test (also see [[references#​lewis2012|Lewis, 2012]]). For a published example of a usability evaluation of a speech recognition IVR, see [[references#​lewis2008|Lewis]] (2008).
  
 ==== References ==== ==== References ====