Skip to content

April 14th, 2020

Usability Testing for AI Assistants

  • portrait of Karen White

    Karen White

Building AI assistants that really help users requires more than just great underlying technology. Great AI assistants are built by product teams that bring end users into the design process, to make sure the software is solving the right problem in the right way.

UX testing and techniques like human centered design have been widely used for many years by product teams developing web, mobile, and desktop applications. The same principles apply to machine learning-based applications like AI assistants, perhaps even more so because AI assistants interact with users via spoken or written conversation. This distinctly human way of communicating creates a dynamic between user and application that's quite different from interfaces based on buttons and menus.

Anders Krogsager is an IT consultant working with the Danish Digital Library to build a chatbot for library patrons in the city of Aarhus. As a developer with a background in usability testing, Anders takes a user-centered approach to building software. We recently discussed how Anders is using Rasa and how he brought users into the process of building the Library Chatbot. From recruiting co-workers to help with conversation design, to testing prototypes with real library patrons, Anders brought the user's perspective into the development process at every stage.

In the process, Anders worked with the Rasa product team to provide feedback on Rasa X. His input led to a recent enhancement released with Rasa X 0.27.0: making the Share your bot screen mobile responsive.

In this post, we'll discuss why user testing is critical to building effective AI assistants and how Anders puts user-centered design principles into practice. We'll also share tips for running successful user studies.

Why User-centered Design Matters

One of the first things Anders emphasized was the need to build to users' real needs instead of building to a spec. "A lot of my software projects start as a meeting with a manager and a list of requirements-and no end users," Anders says. This is a common starting point, but it leads to a very real pitfall: what you build might not be what your users want.

"It is easy and comfortable to accept the list and turn it into code," Anders notes, "and it pretty much guarantees that the program needs to change once the users get their hands on it." Before writing a single line of code, it's important to understand how the AI assistant will help the end user solve their problem. "I want to see user stories and mock-ups before I start developing," Anders says.

The next step should be to build a lightweight prototype, based on user input. It's okay if the prototype doesn't include every feature or lacks polish. The goal is to assemble the minimum feature set required to solve the user's problem: no more, no less. From there, you can determine what's working and what isn't by observing users and collecting their feedback. "Build prototypes with the users - because without the user there is no product," Anders says. "It is too easy to assume that we know what the users need, so having them physically present and involved in the process of building something is well worth the time and effort."

Designing Usability Tests

First, consider what you hope to learn at each stage of your assistant's development. If you're in the very early planning stages, your goal should be to conduct user interviews that help you identify the pain points, goals, and frustrations users have with existing solutions. From there, you can narrow down which problems your assistant can effectively solve and begin planning which features you'll implement. At this stage, a framework like Jobs to Be Done can help you identify the right questions and go beyond surface-level answers to understand what users are really asking for. Surveys, calls with existing customers, or in-person interviews are all valuable sources of information at this stage.

Once you have something you can actually put into the hands of users, it's time to observe how well the reality of your design matches up with users' behaviors and expectations. Tests can take place in a lab or office setting, remotely, or out in the field, but to get the most accurate results, the test conditions should reflect both the type of user and the environment that your AI assistant will encounter. If you've identified user personas, it's important to make sure your testers match the profile of your target user. When designing and testing the Library Chatbot, Anders leveraged access to library patrons, who were willing volunteers for testing the AI assistant. "We are lucky because our end-users are in the same building as us," Anders says, "and they are easy to recruit."

During the study, ask the user to perform a set of tasks-and then let your user do most of the talking. "Listen to the users and allow them to explain themselves," Anders says. Encouraging users to open up and be candid ensures you're getting a true read on users' feedback. Record the session if possible, or designate a note taker on your team. Designating a note taker has the added benefit of getting more team members involved with observing the study.

After the session, make the most of what you've learned by summarizing the results and making them accessible within your organization. If your AI assistant is meant to replace an existing solution, like the navigation on a website, benchmark the assistant's performance against the old solution.

Once you've analyzed and categorized feedback, and incorporated new insights into your design, learning from users shouldn't stop. After you launch your assistant, you should review user conversations to understand how your assistant performs with users in production.

Tips for Successful User Testing

While many teams agree that involving users early and often leads to better technical products, it's easier said than done. Constraints on time and resources present obstacles to getting out of the office and working closely with users.

However, you don't need a lot of resources to conduct user studies that add value to your development process. Next we'll discuss a few ways you can make the most of your users' time and your team's efforts.

Involve testers early in the process

If you wait until you've already developed a working prototype before testing with users, you've already missed the opportunity to gather important data. In fact, you can start to collaborate with users before you've built an assistant at all.

Anders uses a technique called the Wizard of Oz method to learn how users interact with the assistant, while still in the early design phase. In the Wizard of Oz method, two people simulate a human-bot conversation by chatting through a text interface-it can be as low-fi as a Google doc-one playing the part of the user and the other pretending to be a bot. "What our coworkers did not know is that there was no chatbot, and they were actually chatting with me the whole time," Anders says. "That way we could change our design without wasting a single minute on coding."

Listen and learn

User research often uncovers unexpected insights, but that doesn't mean feedback is always easy to hear. When you've put weeks of work into a prototype, it can be difficult when users don't have a great experience with your AI assistant.

"When a test goes according to plan-good for you," Anders says. "When a test participant points out flaws or is struggling with your design, it is convenient to dismiss a broken user, rather than a broken design."

Go into your first user tests with the expectation that your design will need to be changed. The goal is not to hit a home run on the first try, but instead to learn as much as possible while the assistant's design is still malleable. Anders advises product teams to be ready for anything when conducting user tests. And above all, to embrace the negative feedback with the positive. "Have the courage to accept that your design is flawed and needs to be changed," Anders says.

Pick the right sample size

Work with too many users, and the time and effort required to conduct the study creates a drain on your team's resources. But work with too few, and your sample size isn't representative. Anders notes that people of different ages and backgrounds talk differently, so it's important to test your assistant with a diverse set of users.

"I use 5 different test participants as a rule of thumb," Anders says. It strikes a balance between the effort of conducting the test and the number of new discoveries they produce."

Be careful not to influence the user

When conducting user tests, it's important not to let your own bias influence your users' behavior. After all, as the assistant's creator, you know which intents the assistant can handle and the happy path you want the user to follow. As you guide the user through the set of tasks you want them to perform, be careful not to include hints in your questions or mannerisms that indicate which way the user should perform a task.

"We were careful not to tell people how they should formulate themselves when chatting with the bot," Anders says. "The tricky thing is to get people to ask the right questions without telling them to."

Be sure to remind your testers that you're assessing how the application performs-not how they do. Putting your testers at ease encourages them to behave more naturally, giving you a more accurate result.

Make the experience feel real

Even if you haven't completed development, the look and feel of the chat experience matters. Sure, you could have users chat with the assistant on the command line, but that doesn't reflect the real conditions users will actually experience when using your AI assistant. "Seeing the chat window puts you in the mindset of chatting with someone, rather than doing some logical testing in a computer terminal," Anders says.

Devices play a role too. "A lot of our users won't be sitting behind a laptop or desktop PC when a question pops into their head," Anders says. Increasingly, users perform tasks "anytime, anywhere, and most of it can be done on a mobile device."

To replicate the chat experience while still in the prototype phase, Anders used the Share your bot feature in Rasa X. "We could have built and connected our chat client, but it was so easy and convenient to use the Share Your Bot feature," Anders says. With Share your bot, you can generate a link that allows users to chat with a bot that's still in development. Users see a simplified chat interface, where they can type messages and see the bot's responses.

While conducting field tests on mobile devices, Anders made an important observation about the Share your bot feature: it wasn't mobile-optimized. If users were distracted by a hard-to-use keyboard in the chat interface, they weren't focused on assessing the features and functionality of the AI assistant. Anders connected with the Rasa Product team, and as a result, Rasa X version 0.27.0 included updates to make the Share your bot feature mobile-responsive.

Conclusion

As a user of Rasa X, Anders helped us make the Share your bot tool better. Similarly, users of the Library Chatbot helped Anders and his team improve their design.

"We discovered that some users had very high expectations for our chatbot. They wanted a completely new form of service that was very ambitious, and brilliant at that, but our bot had no chance of servicing them. They were of course disappointed and frustrated with the default "sorry-I-did-not-understand" reply. As a result, we had to write new intents and responses that handle those requests."

Every product owner-from companies like Rasa to developers like Anders-benefits from working closely with users. The Rasa team's conversations with Anders revealed important insights about how his team works and their development workflow, and sparked ideas for future enhancements. The solution to building the best framework for AI assistant developers, or the best AI assistants, lies in working closely with users to understand their needs.

Do you incorporate user testing into your process for developing AI assistants? Or have feedback as a user of Rasa? Continue the conversation in the forum, or by tweeting us @Rasa_HQ.