|Year : 2009 | Volume
| Issue : 6 | Page : 402-406
Technical Review: Current Issues of Usability Testing
Majed Alshamari, Pam Mayhew
School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
|Date of Web Publication||21-Nov-2009|
School of Computing Sciences, University of East Anglia, Norwich
| Abstract|| |
System usability can be measured through various methods. One of the more important and widely employed techniques is 'usability testing', where asks, number of users, evaluators, and other factors are the main elements. This paper reviews usability testing together with current issues that can influence usability testing results, both negatively and positively. It also reviews web usability testing. In addition, in this paper, usability testing in the future is considered in order that improvements may be made.
Keywords: Evaluator, Number of users, Tasks, Web usability, Usability testing, Usability in the future.
|How to cite this article:|
Alshamari M, Mayhew P. Technical Review: Current Issues of Usability Testing. IETE Tech Rev 2009;26:402-6
| 1.Introduction|| |
It is clear that usability is one of the most important success factors in system quality, in particular for websites. Usability testing requires a number of users to perform a set of pre-identified tasks, usually in a usability laboratory. During this testing, evaluators observe how the users interact with the system and identify the usability issues of the system. This paper reviews usability testing and any related issues that influence efficiency. It explores the impact of the role of the evaluator, number of users, tasks, usability problem report, test environment, usability testing measurement tools and other factors. This review is based on the current literature of usability testing and its elements. It concludes by reviewing web usability testing and its future.
| 2.Usability Overview|| |
Usability is a multi-dimensional concept, having a number of definitions. One of the primary definitions of usability was proposed by Miller who argued that the usability is 'the ease of use'  . The definition was then further developed by the International Organization for Standardization (ISO), which defines usability as 'the extent to which the product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use'. In addition, there are usability attributes, and these are collated from a number of sources as in [Table 1] below. Most of these definitions emphasize efficiency, effectiveness and user satisfaction. The variations and differences between these definitions depend on the system characteristics and attributes.
2.1 Usability Evaluation Methods
Usability evaluations should ensure that the software being assessed has elements such as effectiveness, efficiency and user satisfaction (several methods should be used to examine these). Some usability evaluation methods need real users and others do not  . [Table 2] below presents these methods.
2.2 Current Issues of Usability Testing
There are various factors affecting usability testing and its results, such as usability measures, evaluator's role, number of users, tasks, usability problem report, test environment and other factors. These factors are shown in [Figure 1].
2.2.1 Evaluator's Role
The term 'Evaluator Effect' simply refers to the limitation that should be reported among the usability issues that are identified while analysing a user test, and it should be minimized as far as possible by the evaluators  . Hertzum and Jacobson explain it as the apparent differences among evaluators in terms of the number of usability problems found or in the assessment of those usability problems  . The evaluator's role is a critical issue in usability testing, and several studies have proved that problem detection varies noticeably. Hertzum and Jacobson described the evaluator's role as a potential threat when they conducted a study that involved four evaluators whose analyses were individually videotaped. Surprisingly, only 20% of the 93 detected problems were detected by all evaluators  . They argued that the main reason behind these differences was the evaluators' interpretations. They also claimed that evaluators appeared to seek out and prove problems they had already discovered  . Evaluators were also criticized for a lack of methodical analysis just after the test while tests results remain fresh  . Norgaard and Hornbaek suggested three strategies for solving the evaluator effect problem; the first one is through conducting a detailed data analysis. The second strategy is through discussing with other evaluators the specific problems about which an evaluator is unsure. The third strategy is through having the data analysed by different evaluators , .
The number of users has been discussed in a number of researches; Nielsen  has suggested that five users are enough to discover 85% of usability problems. Turner and others confirm that the first five users can detect most of the usability problems and each additional user is unlikely to uncover new usability problems  . On the other hand, Lindgaard and Chattratichart  found that one study revealed that only 35% of all usability problems had been detected where the number of users was five. They also reported that there was another study that had discovered only 55% of the usability problems , , where the number of users again was five. A comparative study was carried out by Rolf Molich, where nine teams were formed for evaluating the Hotmail website. The top team found only 75% of all of the usability problems that had been found by all the teams together  .
Five users was criticized by  and  as it does not take individual differences into consideration. In this regard, usability tests should classify users in terms of their level of systems experience. Previous studies, such as Nielsen and others  have shown that a single user will not come across all problems in a user interface. Furthermore, it can now be concluded that if the website has different types of users, it is vital to consider user numbers and their characteristics seriously.
The usability testing tasks themselves should simply refer to what users do to achieve a goal but they are an important issue and can heavily influence a usability evaluation. Wilson describes selecting tasks as a critical activity in usability testing  . In one instance, choosing the wrong set of tasks led to hundreds of complaints when the tasks were chosen to test only the appearance of the website  . Wilson also highlighted a different case of a failure in task selection when a usability group designed tasks around real user observations. No critical usability problems were detected while they conducted the usability tests. Then they installed the software, and shortly after, customers found that the software was not usable due to performance problems. They realized that the usability testing tasks had been based on 50000 rows of data but many customers had been trying to utilize 10 million rows of data in their own databases  .
In this regard, Lindgaard and Chattratichart  suggested moving the focus from the users to the tasks. They found that there is a significant correlation between the number of user tasks performed by each usability team and the new problems found. They suggested that task designs, number of tasks and task coverage should all have been researched more because of their role in usability tests. Hertzum and Jacobson  claimed that there is no guidance for selecting tasks, and that this can influence the evaluator's role, in terms of problem detection, and therefore the usability problem results. In this regard, a study showed that different types of task can seriously influence usability testing  . They discovered that different types of task design can reveal different types of problems, e.g. problem solving tasks were able to uncover major usability problems where as structured tasks seemed to reveal minor and superficial usability problems  .
There are a number of criteria for selecting and designing tasks such as task frequency and criticality. The former refers to tasks that are performed regularly by users, where the latter refers to the impact of tasks on system activity success  . There are also other factors that affect task selection, such as: Task generality, first impressions, tasks involving new features, edge-case tasks and tasks that the client/product team is worried about. Task generality refers to tasks that should be common; they help to generalize the usability findings after finishing the usability test. First impression refers to tasks that can measure user feelings at the first moment. Tasks involving new features refer to tasks that can help in measuring the impact of any new features in a system. Edge-case tasks can indicate usability issues with large databases and other system usability aspects  .
2.2.4 Usability Problem Report
Conducting a usability test should generate a usability problem report that can effectively help designers and developers to make their decision with regard to the redesign stage. Hornbaek and Frokjaer  mentioned that producing a list of usability problems may not effectively help in practical systems development. They also asserted that problem descriptions should be brief and should also describe how to deal with and treat certain problems.
2.2.5 Test Environment
Usability testing normally takes place in a controlled laboratory. Wolf and others  criticized conducting HCI experiments in a laboratory for two main reasons; little design guidance is offered due to the cost implications for the laboratory. Tullis and others confirmed this reason  , and the second reason is that conducting tests in a laboratory casts doubt over generalizing the experimental results. They claimed that the laboratory experiment is only one of several methods for collecting empirical data related to usability. In fact, users surf and perform tasks under a number of daily circumstances such as workplace conditions, children's noise and other natural factors. There is a significant lack of research that treats or discusses the influence of the test environment upon usability testing, although Wichansky  did suggest that usability testing should be conducted in more natural places.
2.2.6 Usability Measures and Prioritizing Problems
126.96.36.199 Usability measure
Prior to conducting a usability test, testers should be aware of what to test and measure. There are three major ISO standards for measuring usability, which are efficiency, effectiveness and user satisfaction, but this model has been criticized because it is too abstract  . In the meantime, McCall's model  broke down usability into three criteria: Operability, training and effectiveness. Nevertheless, there remains the difficulty of applying usability standards to a system and deciding exactly how to measure the usability of a particular application.
Hornbaek  reviewed current practice in how usability is measured and how problems are related to usability measurements. His review included 180 studies published in core HCI journals and proceedings. He eventually came up with the finding that some studies are weakened in measuring usability because of the difficulties in choosing how to measure a system's usability, what elements should be measured and which ways are the most appropriate to measure it. He described usability measures individually such as effectiveness, efficiency and satisfaction. Effectiveness measures can be worked out through measuring binary task completion, accuracy, completeness, quality of outcome and other factors. Efficiency can be measured through measuring input rates, task completion time, mental effort, learning time, use frequency and other factors. He also summarized satisfaction measures including standard questionnaires, preferences, satisfaction with the interface and others. However, the review was based on a broad usability conception, and it neither described nor suggested what to measure for specific systems such as web-based systems, although the study did conclude that there is a need for more valid and complete usability measures. However, Sauro and Kindlund  proposed a method for standardizing usability metrics into a single usability metric. These tests propose a quantitative model for usability [Figure 2]. They show that the usability aspects are correlated and equally weighted.
Nielsen  recommended using a simplified usability measure, success rate, which he defined as the percentage of tasks that users complete successfully. It divides task completion into three groups: Success, partial and fail  . How to choose and which to select as usability measures is a difficult task, especially so as recent studies have offered more than 54 usability measures  . However, this also affirms the importance of studying the relationships between usability measures, besides what to measure for Internet websites and how. However, recent research seems to combine usability measures  .
188.8.131.52 Prioritizing problems
Prior to judging a problem, a definition of the usability problem should be identified. Each issue that prevents or thwarts users from completing a task can be defined as a usability issue. For example, a hidden log-in link, visual noise, a dead end or an invisible button  . In this regard, a severity assessment then takes place after collating all the data needed for analysis. Three factors play vital roles in prioritizing usability problems and in judging their severity. These factors are: Impact, which refers to the amount of trouble a problem makes for the users; persistence, which indicates how many times a problem is encountered by the users and the third factor is frequency, which means the number of users who face a problem  .
2.2.7 Other Usability Testing Issues
Hornbaek and Stage  discussed four challenges that can improve the interaction between usability evaluation and the design stage. The first challenge is related to whether the software type is a prototype or a running system. They stated that prototype systems may misrepresent real systems' functionality. The second challenge is that insufficient effort is usually allocated to describing or choosing tasks. The third challenge is that of usability problem reports. The fourth challenge is that usability problem reports neither recommend nor suggest problem priority or severity.
| 3.Web Usability Testing|| |
Lucca and Fasolino  reported that testing web usability appears to be more difficult than testing traditional systems. This is for two main reasons; the first is that web users are located all over the world but they access it concurrently, and secondly, different types of hardware and software are used in order to access the web  . The usability of web-based systems has a great impact on these users on a daily basis. Users are unlikely to revisit a website, if they encounter difficulties in using the system, in particular, where alternative websites are available  . Fifty percent of potential sales were lost because of poor web-based systems usability, and difficulty of use was the reason given for the failure of 40% of shop transactions, according to  . Therefore, the usability of web-based systems is critical in determining the success of those systems. Many organizations have now realized the importance of website usability after having ignored it because they did not have website usability objective criteria  . Levi and Conrad  pointed out a fundamental challenge, which is how to recognize a website's limitations prior to releasing it; this could reduce maintenance costs.
Dicks raised four different aspects as the main limitations of usability testing; the first is that testing is always an artificial situation, which lacks realistic circumstances; the second is the inability of the test results to verify that a system works; the third is that participants do not fully represent the targeted website audiences; and the fourth is that testing is not always the best technique to apply  .
| 4.Usability Testing in the Future|| |
Wichansky  concluded that 'quick and clean' usability testing methods are needed, and that such methods should offer more valid and reliable data. He then suggested that usability testing should be conducted in more naturalistic environments such as simulated homes or classrooms. He also suggested that both usability problem reports and testing methods specifically tailored for industry should receive more research due to their importance, as well as the testing of mobile phones and handheld devices. In this regard, there is a growing demand for conducting usability tests in a short time with few resources and on a low budget  . Therefore, inspection methods are usually excluded as they cost a great deal because of the need to hire experts. Discount Usability Engineering is often more desirable due to its lower cost and less time needed, and also because it is based on three techniques: Scenarios, simplified thinking-aloud and heuristic evaluation.
It is clear that usability tests suffer from a number of drawbacks as mentioned above. Work on improving usability test conditions, such as tasks, user number, test environment and evaluator role is important and can contribute effectively in the area of usability evaluation methods.
| 5.Conclusions|| |
This paper reveals how the usability factors can influence the usability testing results. Although the evaluator role, tasks, number of users and usability measure have been researched, more research is still needed in order to improve usability testing results. More research is also needed to investigate the role of tasks in dynamic websites; the literature offers a number of interesting results. Number of users is still a controversial area where the literature supports both sides of the argument on the magic number of "five users". Usability testing environment seems not to have been touched upon enough and should receive more research.
| Authors|| |
Majed Alshamari was born in Saudi Arabia. He received the B.Sc. and MSc. degrees in Information Systems from King Faisal University in Saudi Arabia and University of East Anglia in the UK, respectively. He performed as an instructor in King Faisal University, Saudi Arabia. He has authored a number of conferences papers and posters. His recent publications are concentrating on usability testing process and its issues in order to improve its efficiency. He is also a member of several professional organizations such as ACM, IEEE, BCS, UPA and SCS. He is now studying PhD in the field of usability testing.
Pam Mahyew was born in the UK. She received her PhD in Information Systems from University of East Anglia, UK. Her current work is focused in three areas: The usability of web interfaces; the successful introduction of IT and global software outsourcing.
Building on work already carried out and published in the areas of e-readiness, e-government and website accessibility, her current work in this area is focusing on the usability testing process itself. The contention is that these tests typically overstate the actual quality of web-based interfaces.
One of her long standing interests has been in assuring the successful introduction of IT into businesses. This builds on work carried out previously in the areas of stakeholder evaluation, critical success factors, senior management influences and quality management practices within IT organizations. The intention is to provide guidance as to the best methods with which to achieve IT's desired benefits. The last of her current areas of research is global software outsourcing. Whilst a very topical and emotive subject it lacks detailed study. The main focus has been on the UK and India, but she is attempting to broaden it to include China with some success.
| References|| |
|1.||X. Faulkner. Usability Engineering, Macmillan Press Ltd, 2000. |
|2.||A. Seffah, M. Donyaee, R. Kline, and H. Padda. "Usability measurement and metrics: A consolidated model". Software Quality Vol. 14, pp.159-178, 2006. |
|3.||G. Lindgaard, and J. Chattratichart. "Usability Testing: What Have We Overlooked?", CHI 2007 Proceeding. San Jose, CA, USA: ACM Press, pp. 1415-24, 2007. |
|4.||P. Zaphiris, and S. Kurniawan. Human Computer Interaction Research in Web Design and Evaluation, Idea Group Publishing, 2006. |
|5.||M. Alshamari, and P. Mayhew, "Task Design: Its Impact on Usability Testing", The Third International Conference on Internet and Web Applications and Services, IEEE, Athens, Greece, pp. 583-9, 2008. |
|6.||A. Vermeeren, I. Kesteren, and M. Bekker. "Managing the ′Evaluator Effect′ in User Testing", INTERACT, IOS Press, pp. 647-654, 2003. |
|7.||M. Hertzum, and N.E. Jacobsen. "The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods". International Journal of Human-Computer Interaction Vol. 15. pp. 183-204, 2003. [PUBMED] [FULLTEXT] |
|8.||M. Nψrgaard, and K. Hornbζk. "What do usability evaluators do in practice?: an explorative study of think-aloud testing", Symposium on Designing Interactive Systems, pp. 209 - 218. 2006. |
|9.||W. Barendregt, and M. Bekker. "Managing the evaluator effect in the analysis of video data of children′s computer games", Proceedings of the BCS-HCI, People and Computers XVIII - Design for Life, Leeds, UK, 2004. |
|10.||J. Nielsen. Why You Only Need to Test with 5 Users, Available from: http://www.useit.com, 2000. |
|11.||C. Turner, J. Nielsen, and J.Lewis, "Determining Usability Test Sample Size". International Encyclopedia of Ergonmics and Human Factors Vol. 3. pp, 3084-8, 2006. |
|12.||L. Faulkner, "Beyond the five-user assumption: Benefits of increased sample sizes in usability testing", Behavior Research Methods, Instruments, and Computers, Psychonomic Society Publications, pp. 379-83, 2003. |
|13.||R. Molich, M. Ede, K. Kaasgaard, and B. Karyukin. "Comparative Usability Evaluation". Behaviour and Information Technology Vol,23. pp, 65-74, 2004. |
|14.||A. Woolrych, and G. Cockton. "Why and when five test users aren′t enough", IHM-HCI 2001 Conference, Toulouse, France, pp. 105-8, 2001. |
|15.||J. Nielsen, M. Hertzum, and B. John. "The evaluator effect in usability studies: Problem detection and severity judgements", HFES pp. 1336-40. 1998. |
|16.||C. Wilson, "Taking usability practitioners to task". Interactions Vol. 14, pp.48-49, 2007. |
|17.||K. Hornbaek, and E. Frokjaer, "Comparing Usability Problems and Redesign Proposals as Input to Practical Systems Development". CHI 2005 (2005). |
|18.||C. Wolf, J. Carroll, T. Landauer, B. John, and J. Whiteside, "The role of laboratory experiments in HCI: help, hindrance, or ho-hum?", ACM SIGCHI Bulletin ACM New York, NY, USA: pp. 265-8. 1989. |
|19.||T. Tullis, S. Fleischman, M. McNulty, C. Cianchette, and M. Bergel. "An Empirical Comparison of Lab and Remote Usability Testing of Web Sites", Usability Professionals Association Conference, 2002. |
|20.||A. Wichansky. "Usability testing in 2000 and beyond". Ergonomics, Vol, 43 pp, 998-1006, 2000. |
|21.||K. Hornbζk. "Current practice in measuring usability: Challenges to usability studies and research". Int. J. Hum.-Comput. Stud. Vol, 64.pp,79-102,2006. |
|22.||J. Sauro, and E. Kindlund. "A Method to Standardize Usability Metrics Into a Single Score", Conference on Human Factors in Computing Systems, Portland, Oregon, USA, 2005, pp. 401 - 409. |
|23.||J. Nielsen, Success Rate: The Simplest Usability Metric. Available from: http://www.useit.com, 2001. |
|24.||K. Hornbaek, and J. Stage. "The Interplay Between Usability Evaluation and User Interaction Design". International Journal of Human-Computer Interaction Vol, 21. pp, 117-123, 2006. |
|25.||J. Artim. Usability Problem Severity Ratings, Available from: http://primaryview.org, 2003. |
|26.||M. Hertzum. "Problem Prioritization in usability Evalution: From severity assessments toward impact on design". International Journal of Human-Computer Interaction Vol. 21, pp. 125-146, 2006. |
|27.||G. Di Lucca, and A. Fasolino, "Testing Web-based applications: The state of the art and future trends". Information and Software Technology Vol.48. pp. 1172-86, 2006. |
|28.||C. Osterbauer, M. Kφhle, T. Grechenig, and M. Tscheligi, "Web Usability Testing: A case study of usability testing of chosen sites (banks, daily newspapers, insurances)", the Sixth Australian World Wide Web Conference, 2000. |
|29.||R. Sherry and Y. Chen, "The assessment of usability of electronic shopping: A heuristic evaluation". Internationa journal of Information Management Vol. 25, pp. 516-32, 2005. |
|30.||M. Levi, and F. Conrad, Usability testing of World Wide Web sites: a CHI 97 workshop. SIGCHI Bull Vol. 29 40-3. 1997. |
|31.||R. Stanley Dicks, "Mis-usability: on the uses and misuses of usability testing", ACM Special Interest Group for Design of Communication, ACM Press, Toronto, Ontario, Canada, pp. 26 - 30. 2002. |
|32.||A. Anandhan, S. Dhandapani, H. Reza, and K. Namasivayam. "Web usability testing - CARE methodology", the Third conference on Information Technology: New Generations (ITNG′06), IEEE, pp. 450-500. |
[Figure 1], [Figure 2]
[Table 1], [Table 2]
|This article has been cited by|
||Eye tracking and universal access: Three applications and practical examples
| || Bartels, M., Marshall, S.P. |
| ||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2011; 6766: 525-534 |
||Usability evaluation as part of iterative design of an in-vehicle information system
| ||Mitsopoulos-Rubens, E., Trotter, M.J., Lenné, M.G. |
| ||IET Intelligent Transport Systems. 2011; 5(2): 112-119 |
||Usability Testing for e-Resource Discovery: How Students Find and Choose e-Resources Using Library Web Sites
| ||Fry, A., Rich, L. |
| ||Journal of Academic Librarianship. 2010; 37(5): 386-401 |