How To Design An A/B Test System
A/B Test system has almost become a foundation for the technical company, but how can we design it?
Sometimes we just don’t know why we succeed or why we failed, because we do not always think as our users.
In physics or chemistry, we sometimes would like to use a way called “Controlled Experiment” to find out the root cause. And we could use the same way for the Internet traffic too.
Example
Here is an example of the whole lifecycle of the A/B Test.
We plan to dispatch an advertisement for some products on the Facebook timeline. The UI designer formed two different styles of advertisement (called Style 1 and Style 2), we don’t know which one is more popular. So, we should apply an A/B Test for it.
Firstly, we set up an experiment, in this experiment we divide the users into 4groups, based on some algorithm (we will discuss the algorithm later):
- Group A, takes 10% of traffic, displays no advertisement;
- Group B, takes 10% of traffic, displays advertisements as Style 1;
- Group C, takes 10% of traffic, displays advertisements as Style 2;
- Default, take 70% traffic, display no advertisement.

After the configuration, we can develop the feature implementation. In this case, the A/B Test experiment works in the backend, so the client should just render the style enum returned by the backend.
The backend service may be implemented like these below:
experimentName := "new_advertisement_strategy", // name of A/B Experiment
response.advertisementStyle = abtest.GetStringVal(
experimentName,
userId, // the userId from the request
"style_0", // no advertisement -- the defualt value for exeption
)
return response
After the code release, the new feature comes online and we can check the statistics of the experimental groups, and do the analysis:
- Compare the basic metrics with the base (Group A), for example, DAU, Average Duration, Retention Rate, etc.
- Compare the new metrics with peers, in this case, CTR is the most important metric.

After days of analysis, we found that Group B and Group C all have no side effects to the basic metrics, which means that this advertisement may not displease our user, so it’s safe to display it.
And, Group 3 has a better CTR, which means that Style 2 will attract more users to click and gain more money.
So, finally, we change the strategy of grouping:
- Group A, takes 0% of traffic, displays no advertisement;
- Group B, takes 0% of traffic, displays advertisement as Style 1;
- Group C, takes 0% of traffic, displays advertisement as Style 2;
- Default, takes 100% of traffic, displays advertisement as Style 2;
Now all of the users will be grouped into default, and they will see this advertisement as Style 2.
In this A/B Testing way, we found the most suitable solution with limited resources.
Basic Concepts
We have to clarify our concepts before we move forward.
A/B Grouping
The grouping algorithm targets to split the users into different groups and use different strategies in other groups.
By focusing on the different behavior of the different groups, we could figure out the most positive strategy.
But the grouping is not limited to just two groups of A and B, it can be more groups in the experiment.

Concurrent Experiments
Normally one experiment will affect all of the users, but there will not be only one experiment ongoing in the system.
How can we support the concurrent experiments and make sure that there will be no side effects between them?
The secret is “quadrature”. We use different hash salt to split users so that a group of one experiment will be uniformly split into other groups.
For example, we got Experiment A and Experiment B concurrent ongoing, so in the overall perspective, the users are divided uniformly:

Simultaneously, the users are also divided uniformly inside Group A-1:

In this way, we could support lots of experiments running concurrently and independently.
Essential Components
The most simple A/B Test system should include these parts:
- Database for storing config rules;
- Backend SDK designed for feature integration;
- Analysis tools for the data warehouse.
Database
In the scenario of the A/B Test, the database should just be like K-V storage, in which the key stands for the experiment name, and the value stands for the partition rules and the values of each group.
But for fewer affections to the performance, read-friendly K-V storage should be better, for example, ETCD, ZooKeeper, Apollo, etc.
Using a traditional database such as MySQL or MongoDB is acceptable, but we better implement some caching strategy in the backend SDK.
Backend SDK
When there is a database, it should be a database client in the backend SDK too. When we implement the client for the database, we should also better implement the caching part.
However, the most important part of the backend SDK is the routing algorithm, because you have lots of things to consider:
- The will be lots of experiments concurrently ongoing at a time, we should consider making these experiments has fewer effects on others;
- The abtest#GetXxxVal(eg. GetStringVal) series functions should be high-performance;
- The abtest#GetXxxVal series functions should be stable, which means that when the experiments are not changed, the return value of these functions should not change as long as the parameters have not changed;
Commonly, we would use a good-designed hash function to finish that, for example, murmurhash or cityhash, the core logic will be something like this:
func GetStringVal(userId int64, experimentName string, defaultVal string) {
experimentConfig := getConfig(experimentName) // get config from database
if experimentConfig == nil {
return defaultVal
}
salt := hashalgo.Hash(experimentName)
hashCode := hashalgo.HashWithSalt(userId, salt) % 1000 // min unit 0.1%
for _, group := range experimentConfig.Groups {
if group.EnableFor(hashCode) { // return if hashCode is in the range
report(userId, experimentName, group.Name) // for analyzation
return group.Val
}
}
return defaultVal
}
Analytic tools
The statistics are the key result of the A/B test, so the analytic tools for the data warehouse are necessary.
But the different teams may want to implement their data house in different technical stacks, so the analytic tools will be different too.
As you noticed, there is a line of code above like:
report(userId, experimentName, group.Name) // for analyzation
Mostly, we will send a message to the queue like Kafka or Pulsar, and then we will consume these messages and write it into the data warehouse like Hive or Clickhouse.
And then when we query the business metrics we could join them with the A/B test hitting so that we can analyze the difference between the groups.
Extensional Topics
Feature Filter
When we setup a new A/B Test experiment, normally it will require some premise conditions, for example:
- As a global application, we serve users from different countries, but sometimes a feature should be disabled in some countries due to regional policy;
- For mobile applications, we should deal with version fragmentation. Not everyone uses the latest version app so we should filter some old version applications, sometimes we call it the Version Filter.
So, in these cases, we should filter the invalid traffic and then do the experiment.

Client SDK
Sometimes we want to do the experiment just on the client side. For example, change the color of the title or change the icon.
We could involve servers to help clients do these experiments, but it’s pretty complicated and redundant.
Designing an SDK for client-side will help us get rid of that. Within the Client SDK, the pure client feature experiment will not require the efforts of the server engineer.
A classic Client SDK would request the A/B Test values when the application startup, and then cache the values till the next successful response return.

The A/B Test values are something kind of a map:
{
"new_advertisement_strategy": {
"val": "style_1",
"end": 1669556750137
}
}
Conclusions
All companies care about business metrics, A/B Test is a powerful tool for us to guide our product’s direction.
Building an A/B Test System is quite simple but yet not easy, hope this article does some help.