Data prep aside, it really isn’t that hard to produce a model to predict giving, once you know how. The simplest of models can be expected to give good results. Take one step beyond, however, and things get tricky. Your model may indeed predict giving, but it may NOT necessarily predict conversion — that is, conversion from from non-donor to donor status.
What’s this, you ask? This CoolData guy is always saying that donor acquisition is where predictive modeling really shines, so why is he backpedaling today?
Well, I still DO believe that predictive modeling gives you insight into your deep non-donor pool and helps you decide who to focus your efforts on. But there’s a catch: You may be led astray if you fail to properly define the question you’re trying to answer.
By example, I will show you a model that appeared valid on the surface, but ultimately failed. And then I will explain what I did wrong — and how you can avoid making the same mistakes.
Last summer I had the pleasure of visiting with fundraising staff at a university in another province and showing them what data mining was doing for us. Their Annual Giving manager had a data file pulled from Raiser’s Edge, all ready to analyze, and we did so, in real time, during the course of a day-long workshop.
The model we created was a demo only — done very quickly, without much attention paid to how it would be used — and in fact the resulting score set was not used for anything. But we did have this score set, and I was reasonably sure that the higher scorers would be the better donors, and that a little followup analysis would put the icing on the cake.
So about a year after my visit, I offered to show how the alumni who had given since my visit broke down by the score we had prepared. My hosts sent me the new giving data, and off I went.
All seemed well at first. Have a look at these two charts. The high-scoring alumni (by score decile) gave the most in total dollars, and they also had the highest rate of participation in the annual fund.
No surprises there; I’ve seen this again and again. Then I got over-confident. The small university I did this work for had new-donor acquisition as one of its key goals for the Annual Fund, so I asked them to identify which donors were newly-acquired in the past year, so I could show how they broke down by score. I expected the model would perform well for predicting their participation as well.
There were 300 new donors. Their chart looked like this:
Quite a different story, isn’t it? I expected new donors would be clustered in the top scores, but that’s not what happened. Had my hosts used our demo model to get more focused for the purpose of acquisition, they would have been digging in the wrong places. This model would have been useless — even harmful.
What happened?
It appears that the model was good at finding EXISTING donors, but not POTENTIAL donors. This suggests to me that certain predictor variables that we used must have been proxies for “Is a donor”. (For example, maybe we used event attendance data that seemed predictive, but the event was a donor-recognition dinner — that’s a proxy, or stand-in, for being a donor — and not usable as a predictor.)
That’s a lesson to understand the data you’re using, because mistakes can creep in quite easily when one throws a model together too quickly. Other factors that are probably implicated in this failure include:
Too general a model – 1: The model was not specifically an Annual Giving model. It included any kind of giving in the outcome variable (the predicted value), including major gifts (if I recall correctly). In that type of model, ‘Age’ is given a lot of weight, and younger alumni (who might make up the bulk of new donors) tend to receive depressed scores. In fact, about 60 of those 321 new donors (almost 20%) were Class of 2009, which at that time was the most recent graduating class. The university really focused on getting their support during the phonathon, but this model wouldn’t have been much help in targeting them.
Too general a model – 2: If predicting acquisition really was an over-arching goal, then the model question should have been defined specifically for that purpose. The model should have been trained differently — perhaps a 0/1 variable, indicating recent conversion to participation in the Fund. This requires more work in preparing a single variable — Y, the outcome variable — but it is central to the success of the model.
All eggs in one basket: With a trickier predicted value to train on, the situation called for trying binary logistic regression as well as multiple linear regression — and then testing to see which one did a better job scoring a holdout sample of new donors.
No holdout sample: Which brings me to the final error I made that day — I didn’t have a holdout sample to test the validity of the model. I skipped that step for the sake of simplicity, but in practice you should think about validation right from the start.
Is there anything I did right? Well, I did conduct the test on recent giving that alerted me to the fact that this model did a poor job on prediction for acquisition. This testing, which occurs after the fact, is not the same as validation, which simply gives some reassurance that your model will work in the future. But it is equally important, as it may highlight issues you are not aware of and need to address in future iterations of the model.
In summary, to avoid model suckage you must: know your data in order to maximize the independence of your predictors; define your dependent variable carefully to answer the specific question you’re trying to answer; use different models and test them against each other, and finally, use a holdout sample or some other validation method.
It always seems like we are in the same place, Kevin 🙂 I’ve just been looking at why our purchased donor acquisition model did not work very well last year. I suspect it has to do with the solicitation method, but it will take a lot more work to find out.
Comment by Jason — 20 September 2010 @ 5:08 pm
Your mentioning of solicitation triggers another thought in my head about a potential hazard in interpreting results. If your solicitation focuses hard on the high-scoring people in your acquisition model, then obviously your results are going to show that your newly-acquired donors were high scorers. Doesn’t mean your model was any good. This thought is not really related to your comment, but just want to toss it out there. Alas, I have had some experience with post-solicitation testing of score sets that were never used! The benefit was that the results had no bias due to score level, but clearly that’s a luxury you don’t want to have year after year. Otherwise, why bother?
Comment by kevinmacdonell — 20 September 2010 @ 5:34 pm
[…] I’ve made in data mining and predictive modelling. (See Four mistakes I have made and When your predictive model sucks.) Today I’m pleased to point out a brand new […]
Pingback by More mistakes I’ve made « CoolData blog — 26 January 2012 @ 1:38 pm