Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
arrowPress Releases







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

A/B Tests for Analysing LiveOps. Part 1

by Eugene Matveev on 03/30/20 10:37:00 am

The following blog post, unless otherwise noted, was written by a member of Gamasutra’s community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

Today I begin a series of articles where I'm going to share with you all I know about A/B testing of LiveOps.

The series will include the following topics:

  • ideas for A/B tests, deteriorating tests,  A/A and A/A/B tests;

  • choice of metrics, variants creation and sample preparation;

  • results interpretation.

In the article below I will talk about ideas for A/B tests in games, deteriorating tests and also about A/A and A/A/B tests.

Experiment ideas

While doing LiveOps we often change two things:

  1. text of the offer (e.g, “buy the sword” or “buy the sword now”);

  2. design of the offer.

Here are some ideas for your A/B tests:

  • ASO. When it comes to screenshots, Angry Birds 2, for example, tested three theories: should there be characters or the gameplay process; should the screenshots be vertical or horizontal (people hold their phones vertically, but play this game in landscape mode); should the screenshots have frames or not (some players like frames).

  • Design (one button or two, if two - then what text do we place on each of them).

  • Call to action.

  • FTUE.

  • Description and texts.

  • Advertising.

  • Push notifications and timing.

  • Prices and promotions (we test prices less often than promotions due to problems that may arise as a result of communication between the tested groups).

  • Purchasing screen and the store itself.

  • Etc.

Deteriorating tests

Instead of doing classical A/B tests on your LiveOps events, sometimes it is more cost-effective to perform so-called “deteriorating tests”. For example, we want to add more levels to our game, but we are not sure about the results and we want to run an A/B test. The problem is that it is very time- and resource-intensive, because, for example, creating 50 additional levels takes one month of work. How do you check this theory with no extra costs? Let's say we already have 150 levels in your game. Then for one group of new players, you can reduce this number to 100, and for the second group leave all the 150 levels. This is called a deteriorating (worsening) A/B test: a test in which one group of users deliberately gets an inferior option. If we see that those players who finished 150 levels showed better results, i.e. better metrics, then we can say that, yes, really, we need to add more levels.

The same deteriorating tests you can perform on speed. Suppose you have a hypothesis that your application should run faster. Then you can slow it down for one group of users and test this hypothesis. If speed is really so important, then the people in the control group who used a normal speed app will show better, higher metrics than those who used the slower version of the app.

Preliminary testing

Before starting the A/B test, you need to check the quality of your audience - is it good enough to trust the results? To do this, you can conduct an A/A test or an A/A/B test.

In an A/A test, we do not change anything for anyone, but simply randomly distribute the audience into two groups and check the metrics. If they are similar for both groups, then everything is fine. If not, it means that our audience is too heterogeneous and we can’t trust the results. In this case, it is better to increase its volume so that, according to the law of large numbers, to make it more homogenous first.

If you don’t want to spend time on the AA test, you can run an A/A/B test: it is a test when you do not change anything for two groups, but make a change for the third group. In this case, the test will be successful if we get statistically significant proof that there is no difference between A1 and A2, and that A1 and A2 differ from B in the same direction.

Next Monday I will post an article about the choice of game metrics for an A/B test, variants creation, and sample preparation.


Related Jobs

Airship Syndicate
Airship Syndicate — Austin, Texas, United States
[12.04.20]

Junior to Mid Programmer
Airship Syndicate
Airship Syndicate — Austin, Texas, United States
[12.04.20]

Mid to Senior Artist - Stylized PBR
New Moon Production
New Moon Production — Hamburg, Germany
[12.04.20]

Technical Artist (all genders)
innogames
innogames — Hamburg, Germany
[12.04.20]

UI/UX Designer - Forge of Empires





Loading Comments

loader image