This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
The timeline of our model, by that time, was like this:
The zero point is the day for which we're predicting the future. The forecast is being made for two to three weeks ahead, depending on cohort, as described earlier. Data-mining models are fed with various metrics for past periods, like first derivative of moving average of daily playtime per game day, calculated for past X days. All metrics were calculated based on last X days from point zero -- last three days, five, a week ago, and so on.
A fresh idea was to calculate some metrics relative to points in the past. For example, we could calculate first derivative of moving average of daily playtime per game day for 7 days, but looking back 14 days before point zero. Remember what I said about long tail effect of players' activity? Essentially the idea is to dissect the tail into separate parts and analyze them as independent metrics. We have tried some combinations of such detailed past queries, like (7,-21) - 7 days period 21 days into the past, (7,-14), (7,-7) and (14,-14).
This idea was our epic win, boosting precision and recall after some manual tuning to 95 percent for almost all cohorts:
Most fascinating is the fact that final data mining models with best precision were entirely based on derivatives and calculations of only two metrics -- days of activity and daily playtime! For different segments, different derivatives were important. In case of models for 21-25, all our detailed past calculations were important. But for the 7-9 cohort, models were based on 30-day averages as well as near-past metrics for 3 and 5 days before point zero. At any rate, the math is much more complex than it was for new players' churn predictions. The following is the example of final data mining models (click to see full picture):
And if it looks like a black box with some mystical math inside -- well, you're right. Back when we learned how to predict new players' churn, an alarming fact was that despite the great precision of the model we arrived at, we knew little about the actual reasons for churn. It's the same for veterans -- we have no human-comprehensible results about the nature of churn. Just an awesome, 95 percent accurate black box.
We're now able to predict dramatic drops of veteran players' activity two to three weeks ahead of their exit from the game, allowing our community managers to take care of those players, resolve their issues, or offer some incentives to boost their loyalty.
This data-mining project was heavier on math and the black box approach than the one for newbie churn prediction, requiring more time for fine-tuning and verifying the results, but leading to 95 percent precision and recall rates. Fascinating is the fact that no gameplay metrics made their way into the final data mining models. Prediction was purely based on metrics derived from days of activity and daily playtimes, which are generic for all games, and probably even for web services.