Prioritize A/B Test Ideas With ICE

Published by Nate Selvidge on Apr 25, 2017 •

Here at Trello we like to test things. We’re constantly looking for new ways to improve our user experience. When we do make a change we want to make sure it’s having the impact that we expect. Our go-to tool for this is the A/B test. On the growth team at Trello we spend a lot of time talking to users, looking at our event data, and running user studies to come up with great ideas for tests. Along the way we end up with a pretty large backlog of tests to run. Figuring out what to run and when to run it can be a really daunting task.

Introducing the ICE Score

To handle this backlog we score all of our test ideas using a simple framework called ICE. ICE stands for “Impact”, “Confidence”, and “Ease” and is used by giving each criteria a score from 1-10, with a 1 being the worst and a 10 being the best. To get the final ICE score you add up each score to get a number between 3 and 30 that you can use to prioritize your backlog. Higher scores are better and probably mean the idea should be tested sooner while lower scores mean the idea should either be tested later, or, if the score is low enough, not tested at all.

How To Come Up With Your ICE Score

Each score should be relative to the scores of your other ideas - so a “7” shouldn’t mean anything other than “less than an 8 but more than a 6.” The idea is to get a rough estimate not an exact measurement. The scores are broken down like this:

Impact

To come up with the impact score you should think to yourself “If this test were to lead to an improvement, how big of an improvement would it be?” Let’s say you’re trying to make it easier for a new member to create their account. Changing the text in your submit button might lead to a slight improvement in sign ups, but doing something like adding in a new authentication method could potentially have a much larger impact so it should getter a higher impact score. A bigger impact means a better user experience and taking less time to get enough data to make a decision on the test.

Confidence

Confidence should represent how sure you are that the test will lead to an improvement. This is where any evidence or research that has been done should be factored in. If there have been a bunch of users writing in to your support team saying that they’re having a hard time understanding how much they will be charged when they enter their payment information then you can be pretty confident that clarifying how much you’re charging will have an impact on how many people complete their purchase. Including confidence in the score prevents you from running a bunch of really ambitious tests that don’t end up having any real positive impact on your users.

Ease

Ease represents how much time and effort will go into getting a test up and running. Ease should include everything such as design, copy, and engineering effort needed to build the test. An easier test would be scored higher since getting something out the door and onto the user’s screen is more valuable than having a ticket in your development queue. The faster you can get results and learn about your users, the faster you can iterate and get the next successful test implemented. Evaluating the ease of a test also encourages you to break big tests into smaller ideas because doing so makes it easier to interpret results and means you’re getting information quicker.

Conclusion

The Ideal experiment is something that has a big impact on the user experience, is backed up by solid research and evidence, and requires minimal effort to build. ICE is intended to focus our efforts on moving quickly and having the biggest impact on our users. At Trello, using ICE has really helped us to quickly make decisions on what to test and when.

Up Next

Naturally, we keep all of our ideas in Trello. In my next post, I’ll talk about how we built a Trello Power-Up to prioritize our ideas with ICE scores.