The relevance of methodological concepts for prototype testing

Mar 28 2006

Preface

The focus of this essay is an examination of the role of scientific empirical methods when validatating design ideas through user testing based on paper-based prototypes. I want to briefly recapitulate the role of prototyping in the classical models of software design and critically assess how certain aspects of scientific research methods have to be taken into account in certain stages of the process. This work is based on my background from Social Sciences and inspired by the experiences from experiments during a workshop at the Media Lab Helsinki (University of Art and Design).

Image caption: Preparing paper prototypes for a mobile website

1. From assumptions to knowledge

The main reason why experimental user testing is carried out is to establish a future product not only on findings from theoretical research or assumptions of experts, but to ensure that these are valid in real life. Before discussing possible methodological risks and how to avoid them, it has to made clear what is the goal of the process.

1.1 Interfaces and mental models

The concept of “mental models” is originally a concept from psychology and has been described already in 1943 by Kenneth Craik as “‘small-scale models’ of reality that [the mind] uses to reason, to anticipate events and to underlie explanation” (Soegaard 2003). It has been applied on interface design ever since Donald Norman made it popular in the late 1980s. Similar concepts can be found in Interaction Theory, for example the term of “framing” used by Erwing Goffman (1974); all these models are based on the understanding that the reception of interaction is never an objective process.

“The aim of interface design was to invent metaphors that users could understand, but which had not much to do with the internal workings of the system” (Andersen 2000, p.4).

Thinking of design on the basis of the “mental models” idea, the designer has the role of an interpreter that has to hide the technical concept of the system behind a surface that the target user can understand and handle. And that is the point where interfaces have to be tested. User needs can never be fully estimated based on experience or rules of thumb, and therefore testing is an essential step in the design process.

1.2 The prototype in the design process

When used for testing purposes, a prototype is a tool for the gathering and evaluation of design requirements. This is an explicit part of the “Interaction Design Model”, where the prototype is one of four steps in the circle of user needs, (re-)design, prototype and evaluation.

Based on the traditional “Waterfall Model” of software development (Royce 1970), testing happens at a pretty late state; after the requirements are specified a tool is being designed, implemented and integrated, then tested and redesigned based on the tests’ outcomes.

In the process of “Paper Prototyping”, a low-fi prototype can be part of the first step, the evaluation of the requirements: the cheap and fast testing with paper interfaces makes it possible to perform an assessment procedure already on the very basic assumptions made about the future user and his perception of the mental model. This means that the designer can evaluate his implementation of the interaction process and get a reliable feedback on whether the real mental model of the user follows the model he assumes.

1.3 What does this mean for the testing?

Summarizing my two statements mentioned above, carrying out a test with a paper prototype has one main aim: checking whether the mental model applied by the designer is realistic.

This is where the main advantage of the paper prototype comes into effect. Freed from all details that might distract the testing procedure, the scratch draft of the future system can reduce distracting elements to a minimum and enable testing solely the mental model of the interface-to-be (Rettig 1994, p.22).

But this very elementary nature of the basic concept of paper prototype testing – the pure validation of the mental model – is at the same time the most critical aspect for its validity; if the prototype as well as the testing procedures do not follow certain standards, the validity of the results can be questionable.

2. Falsifications in qualitative research

In qualitative research, the so called “field research” is mainly known from ethnography, but also from sociology and psychology. A prototype testing situation has a different setup than a classical field research, but as a “staged” situation in a laboratory environment it is far from being a classical interview, but very close to experiments as they are used in psychology. The problems related to qualitative research methods are however pretty similar independent from the method applied.

Image caption: Simulating the use of a mobile phone browser

2.1 Relevant phenomena when testing prototypes

I want to briefly examine some of the phenomena that I consider relevant when carrying out prototype tests.

2.1.1 Self-fulfilling prophecy and selection bias

A classic in manifold contexts, the effect of the self-fulfilling prophecy – a prediction that causes itself to become true – might also play a role in prototype testing. Not so much during the test itself, but for example in the acquisition of the test persons: by what is scientifically called “selection bias”, the supposedly random selection of test persons can already be a subconscious or even conscious step to get the findings one is looking for.

Other than in quantitative research, where huge amounts of participants enable the application of reliable random selection procedures, recruiting test persons can be a demanding task whose difficulty might lead to a certain bias in the selection of testers. Also, as the prototype should be tested with members of the target group, the definition of the target group can be a selection bias as well.

On a side note, when carrying out tests based on a horizontal model that covers only parts of the future software application, even the selection of the part to be tested can be biased in the sense of a “selection bias”.

2.1.2 The observer-expectancy effect

The “observer-expectancy effect”, also referred to as the “experimenter effect”, describes a situation where the researcher expects a certain result and unconsciously manipulates the experiment.

In the case of paper prototyping, the most likely scenario is that the test leader makes the “virtual/paper screen” react differently than originally planned, for example by serving the user with a result screen while according to the user’s action he would have ended up in a dead end; this effect is most likely to appear when the user has discovered a major flaw in the structure of the interface (read: she produced an error that no one had thought about beforehand) or when the test leader makes on-the-fly decisions to adapt the model in a different way than planned.

2.1.3 The subject-expectancy effect

Another effect based on the expectations of the test participants can be that the subject expects a certain result and reports it. In our experiments we ran into this effect mainly in the form of “user-made assumptions”: even though the procedure happening on the imaginary screen of the paper prototype was not clear as such, the tester commented what he assumed has happened and that he is now seeing what he was expecting to see.

The reasons for the tester to act this way can be various: fear of failure, assumptions about “what the researcher wants to see” or uncertainty caused by a badly manufactured prototype (e.g. handwriting that can not be deciphered).

2.1.4 The Hawthorne effect

Being at the same time a very popular concept in social psychology and a highly disputed theory that has been proven false, the so called “Hawthorne effect” can at least serve as an example for a certain kind of falsification that can occur in an experimental situation. The theory describes the fact that the performance of an individual may be higher when she knows to be observed (Mayo 1933, ch.3).

In the case of prototype testing, I consider this effect relevant because the test situation is an artificial setup and the test person may show a higher motivation in solving a problem – or then have a lower motivation since it is “just an experiment” and not related to a personal aim to solve a problem.

2.2 Common characteristics

Concluding the list of effects listed above, I want to stress few aspects of the situation in an experiment, that affect its outcome:

- An experiment for usability testing is not a hidden observation. The tester is aware of being watched and knows at least roughly what is the purpose of the experiment (other than in psychology, where testers may be tested for something different than they assume).

- The experiment is standardized to a large extent (the test tools, the tasks, the setup), but since it is neither a computer-based testing method nor as strictly standardized as a questionnaire study, both the test manager and the user may show situational behavior.

- Since a representative selection of test participants is not possible, individual variables affect the outcome of the test.

3. Increasing the reliability of testing results

Considering the testing of a mental model to be the main target of a usability test, as discussed in the first chapter, and taking into account the selected effects described in chapter two, I see a set of rules that can be applied to minimize negative effects of the testing procedure.

Image caption: Complex paper prototype materials for an Ajax web application

3.1 A strictly defined testing tool

The low-fi prototype as the main instrument of the experiment should be organized as strict as possible. Setting up a reliable flow chart and thoroughly preparing the parts needed for the testing does not negate the “fast and cheap” nature of the paper prototyping as such. While using a “quick’n'dirty” draft on some scratch paper may be sufficient for testing a single concept or interface detail, I strongly believe that every minute invested in the design of a clear and slightly more advanced testing tool pays itself back in more valuable results.

But when talking about an “advanced tool” in this context, I do not mean an increase in functionalities. As the idea is to mainly test the mental model of a certain interface concept, the interest must not be in creating a fancy vertical model with an overload in features and effects, but to build a horizontal prototype that is reduced to the crucial elements of the mental model in question. This reduces the workload and at the same time ensures that the findings from the test are focussed to the main elements of the designer’s concept, not to a huge load of details.

Being prepared for even unlikely paths the user might take reduces the risk of being forced to make fast, on-the-fly adaptations to the logic of the system; advance test runs with fellow team members can help to find major flaws in the logic before the testing begins.

Image caption: A tester browses the web site prototype

3.2 Strict rules for the test situation

In addition to the prototype itself, the second important tool for the experiment are clear, non-suggestive tasks. Having predefined, well-designed tasks and establishing a neutral atmosphere with strict rules on when and how the participant will get additional help minimizes the risk that the test leader biases the participant by giving hints through intonation, gestures, posture or even by answering extensively to questions.

Of course the test situation does not have to be a clinically empty and silent laboratory atmosphere. Quite the contrary, making the user feel comfortable and giving him the feeling to be in a safe environment and being able to talk to at least one of the persons present is an important key to succeeding with the experiment. But if we aim at maximum reliability, it can be crucial to have a well-prepared test manager who knows how to react to questions (she must not necessarily give an answer, but some kind of reaction will be expected) without spoiling the test setup.

4. Conclusion

In this essay, I tried to analyze possible disturbing factors that can affect the validity of a usability test carried out based on a paper prototype.

In chapter one, I pointed out that testing a prototype means above all to test a mental model. This affects the design of the testing system and the procedure as such. In chapter two, I presented selected effects that can occur in qualitative studies in the context of prototype testing. The third chapter focussed on the two aspects of minimizing the likeliness of the previously discussed effects: optimizing the prototype and the test procedure.

Neither the list of the four effects in the second chapter nor the two categories in chapter three are exhaustive. The most important rule to be applied on the testing process is based on both the awareness for methodological falsification and the knowledge about the connection between method and result: the data gathered may only be analyzed and utilized by taking into account how it was gathered.

I am well aware of the fact that not every designer can be an expert in research methods. Also, I do not want to deny that valuable results can be gained even with very elementary methodological knowledge or in their complete absence. My hypothesis, however, is that a more scientific approach to the method can enhance the value of testing results and – most importantly – that the bare awareness of falsifications resulting from the experiment method can help to evaluate the findings of a prototype test without over- or underrating their relevance.

Literature

Tweet this Share on Facebook Send by e-mail More bookmarking/sharing

Leave a comment:

(this address is not published here). .