A while back I wrote a couple of posts about mistakes I’ve made in data mining and predictive modelling. (See Four mistakes I have made and When your predictive model sucks.) Today I’m pleased to point out a brand new one.
The last days of work leading up to Christmas had me evaluating my new-donor acquisition models to see how well they’ve been working. Unfortunately, they were not working well. I had hoped — I had expected — to see newly-acquired donors clustered in the upper ranges of the decile scores I had created. Instead they were scattered all along the whole range. A solicitation conducted at random would have performed nearly as well.
Our mailing was restricted by score (roughly the top two deciles only), but our phone solicitation was more broad, so donors came from the whole range of deciles:
Very disappointing. To tell the truth, I had seen this before: A model that does well predicting overall participation, but which fails to identify which non-donors are most likely to convert. I am well past the point of being impressed by a model that tells me what everyone already knows, i.e. that loyal donors are most likely to give again. I want to have confidence that acquisition mail dollars are spent wisely.
So it was back to the drawing board. I considered whether my model was suffering from overfit, whether perhaps I had too many variables, too much random noise, multicolinearity. I studied and rejected one possibility after another. After so much effort, I came rather close to concluding that new-donor acquisition is not just difficult — it might be darn near impossible.
Dire possibility indeed. If you can’t predict conversion, then why bother with any of this?
It was during a phone conversation with Peter Wylie that things suddenly became clear. He asked me one question: How did I define my dependent variable? I checked, and found that my DV was named “Recent Donors.” That’s all it took to find where I had gone wrong.
As the name of the DV suggested, it turned out that the model was trained on a binary variable that flagged anyone who had made a gift in the past two years. The problem was that included everybody: long-time donors and newly-acquired donors alike. The model was highly influenced by the regular donors, and the new donors were lost in the shuffle.
It was a classic case of failing to properly define the question. If my goal was to identify the patterns and characteristics of newly-acquired donors, then I should have limited my DV strictly to non-donors who had recently converted to donors!
So I rebuilt the model, using the same data file and variables I had used to build the original model. This time, however, I pared the sample down to alumni who had never given a cent before fiscal 2009. They were the only alumni I needed to have scores for. Then I redefined my dependent variable so that non-donors who converted, i.e., who made a gift in either fiscal 2009 or 2010, were coded ‘1’, and all others were coded ‘0’. (I used two years of giving data instead of just one in order to have a little more data available for defining the DV.) Finally, I output a new set of decile scores from a binary logistic regression.
A test of the new scores showed that the new model was a vast improvement over the original. How did I test this? Recall that I reused the same data file from the original model. Therefore, it contained no giving data from the current fiscal year; the model was innocent of any knowledge of the future. Compare this breakdown of new donors with the one above:
Much better. Not fan-flippin-tastic, but better.
My error was a basic one — I’ve even cautioned about it in previous posts. Maybe I’m stupid, or maybe I’m just human. But like anyone who works with data, I can figure out when I’m wrong. That’s a huge advantage.
- Be skeptical about the quality of your work.
- Evaluate the results of your decisions.
- Admit your mistakes.
- Document your mistakes and learn from them.
- Stay humble.