The Professor Sabena Minute Minute

7/26/06

USABILITY design newsletter July 2006

We should all be concerned with usability. Just lately David's and Pawel's reviews have shown us that there is a very significant reason for us to be concerned about this.

Good reading here even if you are not directly involved in Usability. These are good principals to follow

Cheers

Timothy

Timothy J O'Neil-Dunne

Managing Partner - T2Impact Ltd

Global Travel eBusiness

Tel (US) +1 425 836 4770

Mobile (US) +1 425 785 4457

Mobile (International) +44 7770 33 81 75 Fax +1 815 377 1583 www.t2impact.com

-----Original Message-----

From: Human Factors International [mailto:hfimail@mail1.humanfactors.com]

Sent: Thursday, July 27, 2006 12:04 PM

To: timothyo@t2ni.com

Subject: User Interface Design Newsletter - July, 2006

User Interface Design Update Newsletter - July, 2006

_________________________________________________

Each month HFI reviews the most useful developments in UI research from major conferences and publications.

View in HTML - http://www.humanfactors.com/downloads/jul06.asp

__________________________________________________

In this issue:

Susan Weinschenk, Ph.D., CUA, Chief of Technical Staff at HFI, looks at new trends in usability testing.

The Pragmatic Ergonomist, HFI's CEO Dr. Eric Schaffer, gives practical advice.

__________________________________________________

IS USABILITY TESTING AS WE KNOW IT ABOUT TO RADICALLY CHANGE?

Usability testing is a tried and true methodology in our industry. Periodically it comes under fire from within and outside the usability community, but has always stood the test of time. Although there is some variation in usability testing protocols from tester to tester, or firm to firm, the basic concepts -- think-aloud techniques during the test, usability engineer observing, logging data, interpreting user actions -- remain the same.

Is this all about to change? Recent research on usability techniques is yielding some interesting results, and may point us in different directions in the future. Do we stay the course of tradition, or do we embrace growth and change?

(Please note that the comments in this newsletter are about actual research, not just trying out different methodologies.)

QUESTION #1: DOES THE USABILITY ENGINEER EVEN NEED TO BE THERE?

For several years there has been debate in the usability community about automated testing vs. having a usability engineer present and running the test. West and Lehman conducted a study in which they compared automated testing with traditional usability engineer-led testing. Are the results and the data generated by automated testing the same as if a usability engineer ran the test?

Here's what they found: There were a few differences in the data coming from users in the automated vs. traditional testing. For example, the task times were longer in the automated test because in the automated condition the reading of the task was included in the overall task time, whereas the usability engineer didn't start the clock in the traditional test until after the participant read the instruction for the task. But much of the data was the same for the automated test and the in-person test, both quantitatively as well as qualitatively. Failure rates were very similar, and both methods elicited plenty of participant comments.

Before you get too excited or too upset (depending on whether you are a fan of automated testing or not), there is one key difference that the researchers didn't think was very important, but I disagree. With the in-person usability engineer, the usability expert found on average 13 additional usability problems that were not identified in the automated condition.

You can get valid information from automated testing. You can use it for major benchmarking measures, but dont expect to find all the critical usability issues.

One vote for tradition.

QUESTION #2: DOES IT MATTER IF YOU TEST WITH LO-FIDELITY OR HI-FIDELITY PROTOTYPES, OR IS THAT EVEN THE RIGHT QUESTION?

For many years lo-fidelity prototypes -- paper sketches, for example -- were considered the preferred alternatives for testing since they were easy and fast to create. It was believed that users would not assume you were "done with design" and would therefore be more likely to give feedback. Recently high fidelity prototypes have taken over, as they allow more realistic depictions of today's complicated, colorful and richly interactive screens.

In a research study by McCurdy et al, the authors argue that we have the question and answers wrong. What should you be using? "Mixed fidelity" prototypes. Characterizing prototypes as low fidelity or high fidelity doesn't capture the possible ranges of differences one can have. The authors suggest that it is more useful to use 5 dimensions:

- level of visual refinement

- breadth of functionality

- depth of functionality

- richness of interactivity

- richness of data model

You decide, based on the purpose of a particular usability test, whether to use low, medium or high amounts for each dimension. Their study suggests that carefully choosing from the dimensions results in data that is closer to "real" performance data, yet you have the advantages of a lower fidelity test (easier to create and change than a final product).

One vote for change and growth.

QUESTION #3: SHOULD YOU TEST ONE DESIGN OR MANY?

Although some usability tests involve testing multiple designs, most test one design and look for usability problems/issues in the one design which will then be iterated. Is there an advantage to testing alternative designs all at the same time?

Tohidi, et al studied whether the quantity and type of comments you would receive during a usability test would change if you showed more than one design. If you show three alternative designs, for example, do you get different feedback, or more feedback, than if you tested one?

The data they collected contained interesting results and implications. When only one prototype was shown, it had higher ratings and more positive comments. People were being "nicer" about evaluating the single design. When users saw three alternative designs during the same test, then they gave more critical feedback. They weren't so "nice." The authors refer to previous analysis by Wiklund postulating that when participants view more than one prototype it sends a clear message that the designers have not yet made up their mind as to which design to use. Since a commitment hasn't been made, the researchers are seen as being more neutral, and thus the participant doesn't have to worry as much about disappointing the researcher with a negative reaction. This in turn allows the participant to be more critical.

Interestingly, in this study the researchers had a hypothesis they were testing that showing users multiple design solutions would help the users engage in participatory design. This proved not to be true. In both the one-design condition as well as the three-design condition, users did not come up with redesign suggestions. (Well, we know that users are not usually designers... this finding is not surprising).

A small but interesting finding in this study was that participants who reviewed only one design made comments, but did not totally "reject" the design. However, some of the participants in the multiple design condition did reject the entire design, saying things such as, "I would not buy this one."

One vote for change and growth.

QUESTION #4: USABILITY TESTING = THE THINK-ALOUD TECHNIQUE?

One of the hallmarks of an in-person usability test is the think-aloud technique. Can you imagine a usability test in which the user is not thinking aloud? Well, think again. In a study by Guan et al, the researchers challenge our assumptions. They look at a technique called Retrospective Think Aloud (RTA). The usual usability testing protocol is Current Think Aloud (CTA). There has been some criticism that CTA does not simulate normal tasks. In "real" life, users are not annotating each action with thinking aloud while they are doing tasks. Lately there has been some interest in using RTA instead of CTA. With RTA, users do the tasks silently, and then talk about what they did afterwards. In this study they compared RTA with eye tracking data to determine the validity of the RTA technique.

They found that people's recounting of what went on in their task performance matched the same sequence as what they attended to according to the eye tracking data. And it didn't matter whether the task was simple or complex.

However, they also found that the participants left out a lot of information. The sequence of what they said they did and why they did it matched the sequence in eye tracking, but there was a lot of information omitted. The researchers attribute this to the fact that the participants are summarizing their actions, but I wonder if this may actually be hinting at a new frontier of usability testing instead -- see Question #5.

One vote for traditional.

QUESTION #5: COMING ATTRACTION?

Both CTA and RTA assume that having users monitor their own actions and reactions results in valid data. But in a fascinating book called Strangers to Ourselves: Discovering the Adaptive Unconscious, Timothy Wilson reviews theories and research indicating that the vast majority of our actions and decisions are made from non-conscious processes. In other words, although we will prattle on about why we do what we do, the real reasons are not available to our conscious minds. It's a compelling argument, with real data to back it up. So what does this mean for usability testing and the think-aloud technique? I'm still working on this one... I'm hoping someone will devise a galvanic skin response mouse so that we can measure changes in bodily functions rather than relying on meta-cognition.

One vote for growth and change.

WHERE ARE WE HEADING?

So what's the final tally? Two votes for tradition, and three for growth and change... Hang on, it might be a bumpy ride!

References for this newsletter are posted at:

http://www.humanfactors.com/downloads/jul06.asp

__________________________________________________

The Pragmatic Ergonomist, Dr. Eric Schaffer

Use automatic unmoderated testing for SUMMATIVE testing only, where you just want to measure time and errors.

Test early and often, irrespective of the quality of your prototype. If your user praises the design, take this with a grain of salt (especially in Asia).

Use the retrospective method only when the task is too complex or the distraction is unacceptable for normal, talk-aloud methods. In formative testing, stop at key points and do little, in-depth interviews to understand the underlying motivations and feelings.

__________________________________________________

HFI IS HIRING:

Many positions available at HFI

http://www.humanfactors.com/about/employment.asp

__________________________________________________

Putting Research into Practice - our annually-presented seminar on recent research and its practical application.

http://www.humanfactors.com/training/annualupdate.asp.

HFI's training schedule:

http://www.humanfactors.com/training/schedule.asp

__________________________________________________

Suggestions, comments, questions?

HFI editors at mailto:hfi@humanfactors.com.