A/B testing might be the single most effective way to turn a good app into an amazing app. However, it’s also a subtle way to lead yourself into THINKING you’re improving your app when in fact your test results are full of false positives or you’re spending precious time testing when you could be doing something else.
Don’t get me wrong. A/B testing can be outstandingly effective at increasing user conversions and your bottom line (which is why all the big guys such as Facebook, LinkedIn, and Etsy are A/B testing constantly), but there’s a time to test and there’s a time to just implement changes. You should skip A/B testing…
1. When being first is more important than being optimized
You don’t have to be a genius data scientist to A/B test well, but it’s not a trivial task either. First of all, A/B testing can take time. You need to plan the test, program the different variants, push out a new version of your app through the app/play store, and wait while users engage with your app long enough to give you clear results.
A/B testing platforms with a visual editor can help reduce the time needed for programming and can eliminate app approval red tape, but to some extent planning and waiting are unavoidable. Of course, if you already have a stable user base and there’s no immediate urgency to make certain changes, the time it takes to A/B test is completely worth it.
Nevertheless, there are situations when time is your most important resource. For instance, a common scenario is when getting to the market first gives you a significant competitive advantage. This could be the launch of a new feature or large design changes.
KAYAK ran into this exact situation when Apple announced the iOS 7. Apple released all of their new developer resources on one day, but it was up to developers when their apps would adopt the new design.
KAYAK is a company that tests A LOT. A data driven and experimental mentality is core to their corporate culture, but this was a time when they chose to just implement a large suite of design changes rather than test each detail. And it paid off.
According to their Director of Engineering for Mobile at the time, Vinayak Ranade, “If we had spent time incrementally testing every single change we’d made and redesigned, we would never have made it. And a lot of companies did do that and they were three months late to the game.”
2. When you are fairly certain the hypothesis is wrong or you have no hypothesis at all
It’s easy to focus so hard on developing an experimental culture that you start testing everything. Literally everything. Even when you have no clue what exactly you’re testing and even when you already know the change probably would not be helpful.
While it’s great to test anything that could have a positive impact on your bottom line, it’s important to remember that the more you test, the more likely it is you will have false positive results. The typical threshold for statistically significant results is 95%. That means that you usually run a test until you’re 95% certain your results are accurate.
But 95% statistical significance is the scientific norm. This is the same rigorous standard that’s applied to FDA clinical trials. And yet, it still means that if you run 100 A/B tests with statistically significant improvements on your app, 5 of those tests would not be expected to improve your app at all.
There is no way to completely avoid this, but there are many ways to mitigate the number of false positives you get. The best thing to do is to test with care. Make sure you know what you’re testing and have a solid hypothesis as to how your test can improve the bottom line.
If you’re testing a button color, why do you think green will be better than blue? Are you randomly testing colors or do you think a certain contrast between the button and background colors will make the button more noticeable to customers?
Creating a good hypothesis and planning the test(s) to prove the hypothesis will give your tests direction and yield actionable insights that are less likely to be due just to chance.
Likewise, A/B testing should be skipped in situations where you know that an idea almost certainly will improve your app and the risks associated with blindly implementing the idea are low.
For example, Robot Invader, the makers of Wind-up Knight and Rise of the Blogs, consistently asks beta users for feedback. After playing the beta version of their newest game, Wind-up Knight 2, several players thought there wasn’t enough congratulatory “glitter” after completing achievements.
The recommendation from users was that more pomp and circumstance be added so that players would feel rewarded after accomplishing certain tasks and be more aware of the new features they just unlocked. The downsides of implementing something like this are close to zero, and the likely impact is positive.
There is no reason to spend time and resources to test something that probably is good and has low risk. Jumping to implementation is perfectly advisable.
Screenshot of me completing a level PERFECTLY on Wind-up Knight 2. Yea, I’m bragging.
3. When you don’t have enough users
As with any scientific experiment, you need to have enough data points to gather statistically significant results. This means you need to have a minimum number of users participating in each test. Depending on how you structure the test (how many variants) and what your expected results are (a small improvement off of an already high conversion rate or a large improvement off of a low conversion rate), you might need thousands of users to get statistically significant results. Since not everyone has Google’s scale, the key is prioritization.
If you don’t have many users, you might want to first focus your time on activities that will bring in more users. This could be marketing or even pivoting your app to build up the features that customers are actually using. Once you have enough users to start optimizing, you might have only enough users to run one test at a time.
In this case, it’s really important to first test the ideas most likely to have a big impact but too risky to jump straight to implementation. Examples of risky yet likely impactful ideas are changes to in-app purchases, login screens, page flow, and algorithms related to app logic (i.e., how recommendations are surfaced, how search queries are answered, etc.).
Here’s a simple chart to estimate how many users you need to get statistically significant (95%) results when doing an A/B test with two different variants (an A and a B). The number of users you need depends on your conversion rate (existing conversion rate of variant A) and how much better you expect the new variant to be (predicted increase in conversion rate of your new variant B).
Example: If your current conversion rate is 5% and you predict that it’ll increase by 15% with your new variant, you’re expecting your new conversion rate to be 5.75%. For this test, you’ll probably need around 9,200 users to get statistically significant results. That is, 4,600 users for variant A (your current version) and 4,600 users for your variant B (your new version).
When you’re low on users, you also must watch the funnel: the higher up you test, the faster you’ll get results. If 1,000 daily users land on your app’s login screen but only 100 make it to checkout, with all else being equal, a test of the login screen can produce results up to 10 times faster than a test of the checkout screen, simply due to the volume of users.
4. When what you’re testing contradicts your brand or design principles
While we want to test as much as possible, there are some things that are hard or unwise to test. A/B testing a new logo after your company has been established for years can cause brand confusion with your customers. You might get more conversions in the short term as the change catches people’s eyes, but potentially it could be damaging in the long term to test radical changes to your brand.
This especially applies to design elements. An unusually large button or off-colored button might get more clicks because it stands out so much, but it could be impacting how your users see your brand. An otherwise elegant app becoming less elegant might not largely impact user engagement at first, but you could lose customers over time.
Similarly, some changes to price are really difficult to test (not to mention frowned upon by Apple). We test with the assumption that test results are reproducible and externally valid. In other words, testing one random group of people will produce the same results as testing another random group of people.
Logos and sometimes prices are not like that because customers talk to each other. If it’s highly publicized in the media that it costs $9.99 to unlock the full features of your app, it’s probably not a good idea to show a different price to some users. They might have read the article that promised $9.99 and be much more likely to upgrade if their version is cheaper or much less likely to upgrade if they see a higher price.
Either way, your results could be entirely biased and inaccurate, not to mention that huge PR mess you just got yourself into: once pricing goes public, all tests are off.
All in all, A/B testing your native mobile app is challenging but well worth the effort because it will help your good app become great. However, experienced testers test with caution:
- Don’t sacrifice time for optimization when time is more important.
- Test frequently and continuously but avoid over-testing and aimless testing. Have concrete hypotheses in mind and plan your tests to prove or disprove them.
- Make sure you have a sufficient number of users to gain statistical significance on each test. If you don’t have many users, prioritize tests so that you don’t spread your users too thinly on each test.
- Do not pit intelligent design against evolution through testing. New ideas being tested should mesh with your overall brand, look, and feel.
About the author: Lynn Wang is the head of marketing at Apptimize, an A/B testing platform for iOS and Android apps designed for mobile product managers and developers alike. Apptimize features a visual editor that enables real-time A/B testing without needing app store approvals. It has a programmatic interface that allows developers to test anything they can code. Sign up for Apptimize for free today or read more about mobile A/B testing on their blog.