I like simple charts that convey ideas with maximum impact. The “lift chart,” well-known in direct marketing, is one of these. I recently found a good description of lift charts in the newly-published third edition of “Data Mining Techniques” by Gordon Linoff and Michael Berry. (Another good discussion can be found here.) I will show how you might use this tool to demonstrate the performance of a predictive model, and follow that with some ideas for even better uses.
Let’s say you’ve created a model which assigns scores to your constituents, ranking them by their likelihood to give, and that you’ve used it to segment your appeal — you more or less started soliciting at the top of the list and worked your way down, leaving your least-likely prospects till the end. I say “more or less,” because given other factors involved in segmentation, you may not have strictly ordered your appeal by propensity score. It doesn’t really matter.
So now the fiscal year is out, and you want to compare results obtained by using the propensity score against some sort of alternative way that you could have proceeded. The logical alternative is using no model at all. In fact, it’s more than that: the alternative is soliciting your prospects in perfectly random order.
The table below shows what that scenario looks like, using a Phonathon model as an example. The first column is how many prospects out of the entire pool have been contacted, as a percent. The second column is the cumulative amount of Yes pledges that have been received out of the total number of Yes pledges for the year, again as a percent of the total.
In this “no model” scenario, once you’ve solicited 10% of your prospects (first column of the table), you’ve gotten 10% of all your gifts (second column of the table). At 20% of your prospects, you’ve gotten 20% of your gifts (cumulative). And so on, until calling all prospects yields 100% of all the gifts and pledges received.
I know it seems silly, but let’s chart it. The x-axis is the percentage of prospects who were attempted at least once, and the y-axis is the cumulative percentage of all gifts and pledges that came in by phone. The chart of expected results from random solicitation is as exciting as it sounds (i.e., not very). It’s a perfectly straight line, because just as in the table above, every percentage of the prospect pool yielded exactly the same percentage of the total number of gifts and pledges.
It’s a hypothetical (and artificial) scenario; the chart you create will look exactly the same as mine, regardless of the model.
Good so far?
The next step is to add another line to the chart which represents the results from the real solicitation, i.e. as guided by the predictive model scores. Our chart is created in Excel, so that is where we will prepare the underlying data. The first two columns of the table below are the deciles. What I’ve done is rank everyone who was contacted at least once by their raw score, and chopped that list up into deciles in my stats software. The top 10% of prospects, the ones who were called first, are in the top decile (10). And of course, each decile contains roughly the same number of prospects.
The remaining three columns show how I calculate the final result: The cumulative percentage of all Yes pledges that correspond with each decile. For example, by the time we reach Decile number 5, we have called 60% of all prospects (“5″ is the sixth row down), and received 1,354 Yes pledges, which is 76.5% of all the Yes pledges received during the year.
Now we have enough information to complete the table that we started at the beginning. The first column in the table below contains the values for the x-axis, and the other two columns are the y-axis values for our two lines — ‘called at random’ for the cumulative percentage of all Yes pledges in our hypothetical random calling (which we’ve seen already), and ‘called by score’ for the cumulative percentage of all Yes pledges in our actual score-driven calling:
The first data point will be at 0% on the x-axis, where of course both lines touch. They touch again at 100%, where calling ALL prospects returns 100% of the Yes pledges, regardless of the order in which prospects were called. The lift chart, created from the table, looks like this (click for full-size version):
At the 10% mark, about twice as many of the pledges came in for scored prospects as for prospects contacted at random. That difference, a factor of 2.18, is called “lift“. (Thus, “lift chart.”) When we penetrated to 20% of the prospect sample, we continued to get twice the yield of the random line (lift = 2.07). Therefore the point of maximum lift is somewhere between the first two deciles. After that, the line begins to flatten, and the relative advantage of scoring vs. random calling begins to diminish.
Every once in a while, the question comes up about how to measure the positive effect of employing predictive modeling in a fundraising appeal. I don’t think this question can be answered definitively, but a lift chart is a good thing to have on hand. It’s not difficult to create, and easy to explain to someone else.
On the other hand, the message it conveys is nothing more profound than, “Using the predictive model worked better than soliciting prospects at random.” Although that might be just the thing your boss needs to see with her own eyes, that’s not a very exciting conclusion for those of us who make and use predictive models.
I think lift charts can be used for more than just that. In my example I used only one model, comparing its performance against using no model at all — it would be far more enlightening (and realistic) to compare the performance against at least one alternative model.
What we would like to see in a successful model is a “called by score” line that shoots upward at a steep angle and begins flattening only after reaching a high percentage of our goal. That would indicate that our “Yes” pledgers are concentrated in the upper scoring levels of our model. Does the chart above show a good model, or a mediocre one? Every application is different, and it’s hard to say what constitutes “good.” Without a second model to compare, it’s anyone’s call.
As well, my example demonstrates an “after-the-fact” analysis. A lift chart to compare two models before deployment would be most helpful, using the results of your holdout sample to judge which model produces the best lift curve.
And finally, the Phonathon lift chart shows results for both renewed donors and never-donors. It would be far more interesting to see those groups charted separately, with at least two competing models available for comparison in each chart.