Bayesian A/B Testing

Bayesian A/B Testing

Retailers today have to be on their toes to bring in new features and programs into their offline and online channels to improve customer experience. To bring in any new feature into their ecosystem, retailers have to perform rigorous experimentation to decide what is best for the business. These experiments are often analyzed by using A/B testing methodologies. Statistics plays a very important role in understanding any A/B test result. Now, the organizations can make steeper decisions on empirical data as much as possible through A/B testing. There are several methods of computing a single change in number, which helps in determining whether to act on implementing disparity over the experiment control. But which methodology is more conclusive, Frequentist or Bayesian A/B Testing is still a confusion for many. Businesses traditionally preferred frequentist methodology to assess A/B tests, but many have switched to much sophisticated and accurate methodologies like Bayesian to assess results. Let’s begin with the difference in the methodologies between Bayesian and Frequentist A/B testing and then, a brief overview of Bayesian A/B testing.

Bayesian vs Frequentist A/B Testing- Which one is better?

Many businesses choose Bayesian A/B testing over Frequentist for better results. Bayesian inference incorporates relevant prior probabilities and can calculate the probability whose hypothesis is true. It can have multiple well-defined hypotheses, whereas Frequentist stats cannot assign a probability to a hypothesis. Bayesian used to be computationally heavy previously but with the current technological advancements, it can be implemented without the need for any high-performance computing systems.

Frequentist method is computationally quicker and reliably offer mathematical ‘guarantees’ about future performance. But Bayesian methodology can yield results far more quickly than the Frequentist method. Sometimes in Bayesian A/B testing, priors are often difficult to justify and can be a major source of inaccuracy, but you do get to choose the strength of prior to avoid any bias. Frequentist approach does not consider the prior performance of a similar test and relies solely on current data.

Introduction to Bayesian A/B Testing

Bayesian A/B Testing takes the backward approach to data analysis. Backward approach refers to the past information of the similar experiment which is encoded into a statistical device. This device is known as prior. Prior combines with current experiment data to conclude the results on hand.

It incorporates relevant prior probabilities and can calculate the probability whose hypothesis is true. The parameters we are evaluating in Bayesian A/B testing is treated as a random variable. The Frequentist methodology has them fixed. These random variables are controlled by mean, variations, i.e. parameters and distribution (binomial, Gaussian).

Approach to Bayesian A/B Testing

In Bayesian A/B Testing, the prior can be uninformative or informative that incorporates a subjective belief about parameters. It gathers data and obtains a posterior distribution before updating prior distribution. This represents one’s updated beliefs about parameter after perceiving data. It analyzes the posterior distribution and then summarizes it all.

BAYESIAN EQUATIONS FOR BINARY OUTCOME AND COUNT DATA

The best part about Bayesian equations for Binary Outcome and Count Data is, one doesn’t have to collect any predetermined sample size to get better results.

For Binary Outcome
For binary outcome test, there is a probability of B beating A in the long run, which is given by,

αA is 1+ the number of successes for A
βA is 1+ the number of failures for A
αB is 1+ the number of successes for B
βB is 1+ the number of failures for B
B is the beta function

For Count Data
Evaluating count data needs different equation,

Group 1 has higher probability rate of arrival data than group 2. Where α represents total event counts in each group and β is exposure i.e. relative opportunity for the event to occur. ß is a Beta Function.

Bayesian framework for A/B Testing

Assigning Prior– It defines the prior distribution that incorporates subjective beliefs about a parameter. The prior can be uninformative or informative. If the behaviour of success metric is unknown or ambiguous, it is always safer to assign uninformed prior.

If the behaviour of success metric of the A/B test is well known, we prefer informed prior which considers the subjective belief of the behaviour. The more informative the prior, the more you need to “change” your data and beliefs, so to speak, because the posterior is very much driven by the prior information. The strength of informed prior is dependent on how firmly we believe that the metric will behave in a certain fashion.

Estimating Parameters-

-Run the experiment for the set duration and analyze the posterior distribution of 2 variants.
-With more observations, the posterior distribution of 2 variants separates significantly.
– We can use sequential analysis to avoid the problem of continuous monitoring.

Significance Calculation- 

-The histogram depicts the probability of a variant being better than the other.
-The experiment is either scaled or discarded based on the probability values.

EXPERIMENT SETUP AND INFERENCE

Setup Experiment-
-The desired traffic split: Try and ensure that the control & test arms receive a similar amount of traffic and keep a certain portion of the visitors in a ‘Holdout group’.
-Experiment-specific tags/identifiers: Define test variant tags, Page IDs, Element interaction tags and so on, to filter your data according to the relevant success metric.
-Scripting: Build codes to aggregate the data and calculate the KPIs.

Prior Distributions and Run-
-Generally, a weak prior is assigned to test statistic to avoid any bias.
-Same prior is assigned for both A & B variants.

Timelines:
-Typically, the test is run for 2 weeks to involve a variety of customers and events.
-For experiments that need high certainty, the duration is extended by 2 extra weeks if the acceptance threshold (probability) is not met.

Drive Insights
-Trivial experiments (experiments that do not have a direct impact on revenue) are scaled/discarded based on the acceptance threshold of 85{11b4d86a5810527c77a0cf7d4ce0e3afd0fbcf990461d3f22f24a3792c474f0e} probability after 2 weeks of experiment.
E.g. of trivial experiments: Tests on the new banner on-page, test on saving size preferences toggle etc.
-Critical experiments are scaled/discarded based on the acceptance threshold of 90{11b4d86a5810527c77a0cf7d4ce0e3afd0fbcf990461d3f22f24a3792c474f0e} probability after 2 weeks of the experiment. If a critical experiment has >80{11b4d86a5810527c77a0cf7d4ce0e3afd0fbcf990461d3f22f24a3792c474f0e} and <90{11b4d86a5810527c77a0cf7d4ce0e3afd0fbcf990461d3f22f24a3792c474f0e} probability after the end of week 1, the experiment is continued for 2 extra weeks (4 weeks total) to achieve high certainty of results.
E.g. of critical experiments: Changing layout of payment selection page, re-arranging payment preference order etc.

Takeaway

Unlike the Frequentist method, Bayesian statistics provides the probability of a variant being better than its control. In experiments, where lift in success metric of the new variant is small, Bayesian methodology is more willing to accept the new variant. Bayesian provides a platform for businesses to innovate faster and implement what is best for their customers. It accomplishes this without sacrificing reliability by controlling the magnitude of our bad decisions instead of the false positive rate.

About the Author

Shivakumar is an Associate Manager at Factspan who has a keen interest in retail industries and has worked for multiple fortune 500 retailers across domains of merchandising, marketing, customer strategy and product management. He loves watching football and is an avid supporter of Arsenal FC. He also loves travelling and has travelled across 9 countries in 3 different continents so far.

>>Please read Customer Lifetime Value for Subscription-Based Business

Featured content
Choosing the Right Cloud Data Engineering & Analytics Platform: Databricks vs. Snowflake

Databricks vs. Snowflake (2024)...

Enhancing Retail Data Quality with Apache Airflow ...

Data governance consulting

Data Governance Consulting – Guide...

Snowflake tutorial

Quick Tutorial on DataFrame Updates in Snowpark...

Building Gen AI for Enterprise – PoV...

Technical Challenges In Building An Enterprise Gen...

AI for trucking industry- webinar

Implementing, Scaling and Governing AI Solutions f...

Enhancing CX and Reducing OpEx for Trucking Logist...

Case study : Unified Workforce Data automation using snowflake

Unified Workforce Data and Automated Insights with...

banner image-logistics

AI-Driven Transformation in Trucking and Logistics...

Scroll to Top