Final Project: User Testing

Purpose. This activity will help you gain experience with planning and conducting user tests, and then reflecting and synthesizing a set of observations for future improvements. You will apply this approach to the near-complete version of your team project app, and will then have the opportunity to take advantage of the insights you gleaned in making final changes to your project.

Tasks

In this activity, you will conduct two user tests, each with a different participant. Ideally, your participants are members of the intended audience of your application, but they may be anyone of your choosing—the only limitation is that they cannot be currently taking 6.1040. Each test should last approximately one hour, comprising a period where your participant works through a series of tasks you have set them, and then a portion to debrief them about their experience.

Select and schedule your participants. Ideally, your participants span a diverse set of your app’s intended audience, but they may be anyone of your choosing—the only limitation is that they cannot be currently taking 6.1040. Each test should last approximately one hour, comprising a period where your participant works through a series of tasks you have set them, and then a portion to debrief them about their experience.
Prepopulate realistic data. Populate your app with a diverse range of data to give users a realistic impression of what it would be like to use your app.
Formulate a task list. To make sure your user tests yield informative results, it can often be better to set your users specific tasks rather than let them explore your app in an open-ended way.

Create a list of tasks that cover the key concepts of your app, focusing on the concepts that are particularly unique or important to your design. Each task should typically involve executing a sequence of user interaction actions. You should include at least 5 tasks, from simple one-action tasks to more complex multi-action tasks, that will test how easily the user can cross the gulfs of execution and evaluation discussed in lecture. You should plan for these tasks to take roughly 40 minutes of your hour session.

Format your task list as a table, where each row corresponds to a different task and with columns for: (1) a short title for each task; (2) a succinct instruction, in words that could be given to the user; and, (3) a brief rationale for including the task, explaining why the task is worth testing, what you hope to learn or uncover about your design when testing this task with a user versus excuting it yourself as part of a cognitive walkthrough. Order the rows in such a way that any application state required by subsequent tasks has been correctly setup by earlier tasks.
Conduct your studies. Start by obtaining your participant’s consent, and prepping them about their role and expectations. Then ask each participant to perform the tasks in your task list. Have them perform each task following the order defined in your table, and encourage them to think out loud. If they fall silent, prompt them to keep thinking out loud. If they get stuck, give them a chance to get unstuck first—only intervene if they are really unable to make progress. Try to say as little as possible, and avoid explaining the user interface to them.

Throughout the study, watch what your participant is doing, saying, and even feeling (e.g., watch for facial expressions, sighing, etc. which often signal frustration or other emotions). Take careful notes throughout. You might also consider capturing a screen recording, as well as audio/video of your participant (all with their consent, of course) for further analysis after the session is complete.
Debrief. In the final 20 minutes of your hour session, debrief your participant to get their overall thoughts and impressions of your application. What did they think worked well, versus what could be improved? Dig into moments you noticed them hesitate, get confused, or get stuck—what did they find confusing, what were they hoping to do, how did they figure things out?
Summarize lessons. For each test, write a brief report (in 100-400 words) that summarizes and analyzes key moments of participant behavior—i.e., what interesting or unexpected things did you notice, and why do you think they occurred. For instance, you might observe a participant struggle with a particular interface element or interaction flow—your analysis might then rely on your debriefing to describe what were they expecting to do, what did the interface do instead, across which gulf did the flow break down, etc. Aim for your report to be balanced between reporting positive and negative results.

Follow these summaries up by listing 3–5 flaws or opportunities for improvement. Each should describe what the flaw or opportunity is, explain why it is currently occurring, and suggest one or more ways in which it might be addressed.

Rubric

Component	Description (what good work looks like)	Common Failures (examples of falling short)
Prepopulated Data	The deployed app is richly populated with realistic data that gives a vivid impression of real-world usage.	Deployed app contains dummy data or placeholder text that makes it look like a prototype or toy.
Task List	Task list covers the key aspects of the application. Rationales are well justified, describing what insights might be obtained.	Task list misses important functionality, or spans only a limited a range of complexity. Rationales poorly justify why it is necessary for a user to be performing these tasks.
Study Reports	Reports go beyond just reporting observations to analyzing what caused participant behavior, grounding in evidence (e.g., quotes from, or observations of, participants).	Reports are unbalanced—overly focusing on either the positive or negative results—and/or miss several opportunities for analyzing results. Little to no evidence is provided to concretely ground inferences.
Design Flaws/Opportunities	Flaws/opportunities have crisp, descriptive definitions with rich explanations of how the flaw manifested and how it might be addressed. All are grounded in evidence from the study results.	Flaws/opportunities are superficial or did not need a user study to identify. Definitions are vague, and explanations are shallow.

Advice

Build rapport. We recommend building rapport with your participant so that they feel comfortable thinking out loud and making mistakes in front of you. Emphasize that you are testing your application, not the participant—their performance (including when they get stuck, confused, etc.) does not reflect poorly on them.
Prompting thinking aloud. Thinking out loud will feel very strange to most participants, and they will be prone to fall silent. When they do, prompt them to keep talking. Remind them to tell you what they are thinking, what they are trying to do, what questions come up when they try to do a task, and even reading out the things they see on the screen (which can be an important signal of the order in which participants read things on screen, including whether they miss certain things).
Pre-decide where to help. You are going to be very tempted to jump in to help the participant every time they get stuck—resist that temptation. Instead, come up with a pre-determined set of criteria for when you will intervene, and try to stick to that. Of course, if participants are completely unable to make progress, that is a good time to intervene.