AB testing is something we come across more and more, typically in organisations with greater digital maturity. While I love AB testing, too often I see firms using a shot gun based approach lacking genuine insight to create meaningful hypothesis and hoping to hit a random target. While this sounds like a rant, I think it is important to consider our ways of working in the digital space and ensure that we gain maximum value from our efforts.
So I was interested to come across this article on the Marvel App blog with a wonderfully click batey title ‘A/B Testing – You’re Doing It Wrong‘.
The article includes some interesting findings:
-Only 10% of experiments resulted in actionable change — formally releasing a new version of a page or feature
-50% of teams could not make decisions from A/B testing experiments due to inconclusive or poorly measured data
This sums it up nicely: “Companies may be running A/B tests too frequently for too little time, contributing to a high failure rate that makes A/B test results less valuable and meaningful.”
Note that the above data is from small and potentially non representative sample or as the author states “This next dataset is from a qualitative and quantitative A/B testing survey of 26 A/B testing practitioners conducted from May 1 to May 30 in 2016 (Northwestern, IDS — Justin Baker, 2016). While this is not the end-all be-all of surveys, it could still give us some meaningful insights”. While the sample may not be perfect (when are they?) the findings reflect my personal experience.
I believe that at the heart of the issue of poor AB testing is a lack of data triangulation to understand the user experience and using this to create hypothesis that can be tested. Some of the most meaningful A/B tests that I have seen were when qualitative data was used to understand a problem and then a solution was identified and tested.
We find the combining AB testing with some form of qualitative research (such as journey mapping research) is ideal for understanding the ‘why’ of an issue, and also for informing test hypothesise.
What are your thoughts?