Kissmetrics Blog

A blog about analytics, marketing and testing

Built to optimize growth. Track, analyze and engage to get more customers.

How to Improve Your Subscription Based Business by Predicting Churn

Churn prediction is one of the most popular Big Data use cases in business. It consists of detecting customers who are likely to cancel a subscription to a service.

Although originally a telco giant thing, this concerns businesses of all sizes, including startups. Now, thanks to prediction services and APIs, predictive analytics are no longer exclusive to big players that can afford to hire teams of data scientists.

As an example of how to use churn prediction to improve your business, let’s consider businesses that sell subscriptions. This can be telecom companies, SaaS companies, and any other company that sells a service for a monthly fee.

There are three possible strategies those businesses can use to generate more revenue: acquire more customers, upsell existing customers, or increase customer retention. All the efforts made as part of one of the strategies have a cost, and what we’re ultimately interested in is the return on investment: the ratio between the extra revenue that results from these efforts and their cost.

As we saw in a previous post, there are a number of things you can do to improve customer retention overall. But individualized customer retention is difficult because businesses usually have a lot of customers and cannot afford to spend much time on one. The costs would be too high and would outweigh the extra revenue. However, if you could predict in advance which customers are at risk of leaving, you could reduce customer retention efforts by directing them solely toward such customers.

Churn Prediction for All in 3 Steps

Churn prediction consists of detecting which customers are likely to cancel a subscription to a service based on how they use the service. We want to predict the answer to the following question, asked for each current customer: “Is this customer going to leave us within the next X months?” There are only two possible answers, yes or no, and it is what we call a binary classification task. Here, the input of the task is a customer and the output is the answer to the question (yes or no).

Being able to predict churn based on customer data has proven extremely valuable to big telecom companies. Now, thanks to prediction services such as BigML, it’s accessible to businesses of all sizes. In this post, we’re going to see step by step how to predict churn. The process is as follows:

  1. Gather historical customer data that you save to a CSV file.
  2. Upload that data to a prediction service that automatically creates a “predictive model.”
  3. Use the model on each current customer to predict whether they are at risk of leaving.

Step 1: Gather Data

Churn prediction is based on machine learning, which is a term for artificial intelligence techniques where “intelligence” is built by referring to examples. When predicting whether a customer is going to leave within X months, he or she is compared with examples of customers who stayed or left within X months.

Feature Engineering

To make these comparisons, we need a way to represent customers based on information about them that impacted whether they churned or not.

Feature Types

Each piece of information we use to represent customers is called a “feature” and the activity of finding useful features is called “feature engineering.” For churn, we would have 4 types of features:

  1. customer features: basic information about the customer (e.g., age, income, house value, college education)
  2. support features: characterizations of the customer’s interactions with customer support (e.g., number of interactions, topics of questions asked, satisfaction ratings)
  3. usage features: characterizations of the customer’s usage of the service
  4. contextual features: any other contextual information we have about the customer

Feature engineering really is where churn prediction changes from one business to the other. You’ll note that of the features listed above, customer and support are quite generic, whereas usage and contextual are specific to the service you’re selling. Also, the more features, the better. Don’t fret if you’re not sure whether a particular feature is useful. If it isn’t, it will be easily discarded when creating the model.


For a telecom company looking to predict churn, the features could be:

  • usage: average call duration, number of calls made, overcharges, leftover minutes
  • contextual: handset type and value

For a SaaS company, we would be concerned with features such as:

  • usage: number of times user logged in, time spent on app, time since last login, actions performed on app
  • contextual: device type and user agent

Time Frames

In a company that sells monthly plans, we typically would be looking at who’s at risk of cancelling now (X=1), based on last month’s usage. This means that we would be computing usage feature values based on the previous month only. Alternatively, it may make more sense in your particular case to look at usage over 2 or 3 previous months in order to capture information that has an impact on whether a customer churns or not. In that case, we would average usage feature values over this duration.

In a SaaS company I founded that sells yearly subscriptions, we do things slightly different. We predict churn 6 months ahead (X=6), and we take into account usage during the previous year and during the first 6 months of the current year.

Extract Data

Once we have decided on a way to represent customers, we should gather historical data of up to X months in the past. Our aim is to create a dataset of examples that consist of “inputs” (customers) and associated “outputs” (yes or no; i.e., churn or no-churn).

For this, you have to create a script that: 1) connects to your database in order to extract the information required to compute feature values for each customer, and 2) dumps these values to a CSV file where each row is associated to a customer and each column to a feature (except the last column which is used for the output). The resulting CSV file contains the dataset, a.k.a. the “data.” It would look something like this:

predicting churn data set

Snapshots in the Past

It is extremely important to understand that each customer in the data is represented as a “snapshot” of him taken X months ago, so that we could associate this snapshot with the fact that X months later (now) he churned or not (i.e., we associate the input with an output). As a consequence, we have to be very careful not to take into account any information about the customer that became available during the last X months (including his usage of the service) when computing the feature values.

Step 2: Upload Data

Prediction Services

We can upload the CSV file to a prediction service such as BigML or Google Prediction API through a web interface or through an API. The service will automatically create a model that it will use to make predictions. The beautiful thing about these prediction services / APIs is that they abstract away the complexities of creating predictive models from data, thus making machine learning / predictive analytics / data science accessible to the masses.


If you’re interested in having a go at churn prediction without the hassle of extracting data first, you can use this Orange churn data. It’s been “anonymized,” meaning that you can’t identify the customers and features, but it is actual data collected by the telecom company Orange. There are 50,000 data points (i.e., customers), 230 features, and the CSV weighs 8.6MB when zipped.

In the rest of this article, we will use churn data provided by BigML that has not been anonymized. There are 3,333 data points, 19 features, and the CSV weighs 97KB when zipped. Actually, we will use only 80% of the dataset, for a reason I’ll explain later.

Upload Data and Create a Model with BigML

I have chosen to illustrate the rest of this article with BigML, and I have listed the actions to perform on their web interface so you can replicate them. Once you’ve been through the whole procedure, I recommend that you also try other services.

  • Create a free BigML account.
  • Drag-and-drop the CSV file (that you extracted from your own database or that you downloaded above) to the BigML dashboard. This creates a new “source.” BigML makes a distinction between a data source and a dataset, but in our case they can be thought of as being more or less the same thing. Alternatively, you can create the source by linking to the file I’m hosting on Amazon S3, the URL of which is “s3://bml-data/churn-bigml-80.csv.”
  • Click on the source you created. This takes you to the source pane of the dashboard.
  • Create a dataset by clicking the cloud-lightning icon in the top right corner and choosing “1-click dataset.” This takes you to the dataset pane where you can visualize the data as histograms. This is really useful for making sure the data is as you would expect and checking for potential bugs in the data extraction process. What you see should look like this:
  • Create a predictive model by clicking the cloud-lightning icon and choosing “1-click model.” This takes you to the model pane.


Decision Trees

BigML creates decision tree models from the data. When you get to the model pane, you should see your churn model as something that looks like this:

decision trees

A decision tree represents a model where each node is associated with a question on a feature value, with a certain number of possible answers represented by branches, and where leaves are associated with output values. The first question is located at the root node. Choosing an answer takes you to a branch of the tree and to a next node. The process is repeated until a leaf is reached, where you get the associated output value as a prediction.

If you “browse” decision trees on BigML, you’ll notice that you can also get predictions at non-leaf nodes by hovering your mouse over them, and you also get confidence levels (in %) associated to these predictions.

Step 3: Make Predictions

Now that we have a model, we want to use it to make predictions on all customers and see who’s at risk of churning. In the same way we used a script to get a CSV file of snapshots of customers taken X months ago in Step 2 above, we need to create a CSV file of snapshots of customers taken now.

This time, the churn information does not reside in the database. The idea is to use BigML to expand this CSV file with two new columns: a “churn” column containing the churn predictions for all the customers, and a “confidence” column containing the confidence levels for all the predictions:

  • Upload the newly created CSV file to BigML and create a new dataset. If you’re following this tutorial with the BigML churn data, now is the time to get the remaining 20% and to use it (note: the 80-20 split was arbitrary).
  • Go back to the model, click on the cloud-lightning icon and choose “Batch Prediction.”
  • In the new view that appears, choose the model in the left-hand drop-down menu and choose the new dataset in the right-hand menu. Expand the “Configure” section by clicking on it, then expand the “Output settings” section, choose the separator for the output CSV file (default is comma), and click on the 3 buttons to the right to activate the corresponding options (“Add a first row as header,” “Include all fields values,” “Add confidence”).
  • Validate by clicking on the “Predict” green button on the bottom right of the page. You then get to a new page with the output CSV displayed in a text field and you can download it by clicking on “Download Batch Prediction.”
  • On your computer, open the downloaded CSV file in a spreadsheet program such as Excel. Filter the churn column to keep only the “True” values. Then, sort the confidence column in descending order. This way, you see at the top of the spreadsheet which customers are predicted to be most likely to churn.

It Doesn’t Have To Be Fancier

Here’s a fun story about BigML and churn prediction. David Gerster was leading the mobile data science team at Groupon when he discovered BigML and used it to predict churn. He was so impressed with BigML that he joined the company as VP of Data Science! You can read the full story here. The takeaway message for us is that you don’t need more than a service like BigML to do churn prediction and to start exploiting the value of your business’s data!

About the Author: Louis Dorard is the author of Bootstrapping Machine Learning, the first guide to prediction APIs. He is also a data consultant and helps companies exploit the value of their data. You can follow him on Twitter @louisdorard.

  1. I followed the steps and got the exact results you were talking about but have no idea what we did and why we did what we did in Steps 2 and 3 – am I the only one?

    • Step 2 consists in feeding examples of customers who churned and who staid, so we get back a model that we can use in the Step 3. There, we make predictions on all customers and we’re able to say whether they’re at risk of churning or not.

      It’s great that you managed to replicate this, Sanket! Next step for you is to use your own data about your own customers ;) Let me know how it goes.

  2. Great post. Yet, I have to point out the downside of this model is not taking the outside environment into account. In a fast growing world, a better service / plan of your competitors can make more impact. In those cases, depending on only historical data would be quite limited. LTV over time will be decreasing.

    • Thanks Hermes.

      If you think that churn depends on a certain context related to the “outside environment”, and if you’re able to represent this context in a meaningful way, then you should definitely add these representations to the data (these would be extra features of the dataset)!

    • Hermes, thanks for those helpful insights :)

  3. I’m new to these modeling tools. Do you know how would something like this differ from logistic regression where you are essentially throwing all these data points to predict churn as the outcome variable?

    • The underlying idea is the same: throw data points, pick outcome variable and get predictions. Logistic regression is a type of technique that analyzes data to understand how variables relate to the outcome, when the outcome is a number. In churn prediction, the outcome is a class though (“yes” or “no”).

      • Dmitriy Shashkin Aug 20, 2015 at 5:05 am

        Quite the contrary: logistic regression is for the cases where the outcome is qualitative (eg “yes”/”no”). You might have mistaken it with the linear regression (which works only when the outcome is a number).

  4. OK, how would you explain the people who churned in the “churned db” with a prediction of a “No churn” expection? (1st column churn = yes, and second column churn = no)

    • There can be several explanations. It could be that there were external circumstances that we couldn’t capture in the data, e.g. the customer doesn’t have any more money, goes to the competition or stops his activity altogether.

      Or it could be that the model is just not accurate enough because it doesn’t have enough data to understand what’s going on (not enough examples of customers who churned/staid).

      • I don’t have the file anymore but my question actually was: Of the people in the DB who already churned some have a prediction of no churn and some have a prediction of churn. How is that possible, once they churned how can the predition be No churn?

        Or am my mistaken and do the 2 column both have predictions (the first column of previous month and the second column this month?) –> then my question would be: How do I recognize people who actually churned in the previous month (after the predcition)

  5. Great article. Do you have any numbers from your business case about the lift one can expect from contacting, say, 10% of the customers using such a decision tree model, please? Thanx!

  6. Churn prediction being “built by referring to examples” is a method of supervised learning, so I don’t think it is part of the artificial intelligence field of study (which refers to unsupervised learning).

  7. Hi,

    Very insightful post. I was using this tutorial for my own dataset which I extracted from Google analytics. I am working on predicting churn/non churn for existing app users. The data from google analytics that I have does not have churn/ non churn value for each of my example. I am working with variables such as No. of sessions, Avg time of each sessions and Goal conversion rate. I included a outcome variable ‘Churn’ to the training dataset by thresholding on terms of Goal Conversion rate. This way I gave them a value of Churn as 0/1. I am now building a model using BigML using this as training dataset. Is my approach correct ? or is there any other way you would like to suggest to include Churn variable in my dataset when I don’t already have one.


  8. Peter Daly-Dickson Aug 19, 2016 at 10:56 am

    Just to clarify…

    Are all the data points in the source file the values for one specific date in the past?



  9. Hey,
    just small questions regarding some statements:
    ” … over 2 or 3 previous months in order to capture information that has an impact on whether a customer churns or not. In that case, we would average usage feature values over this duration. ”
    Why would you need to average it? If you gather data for each no-new customer (that did not register during those 3 months) during 3 months averaging means you will sum (lets say) call duration over 3 months and divide by 3. And you will do it for each customer the same way, so in fact, there is no need of averaging. You could just use the sum of call durations during 3 months. It is the same when you collect data for only one month because data is daily, you do not average over 30 days, you just sum it to get the monthly data.


Please use your real name and a corresponding social media profile when commenting. Otherwise, your comment may be deleted.

← Previous ArticleNext Article →