For anyone needing a non-technical introduction to data mining for fundraising, there is no better book than Peter Wylie’s “Data Mining for Fundraisers.” His simple-score method is easy to do and easy to explain. But it’s not the method I use. Today I want to talk about the limitations of the simple-score method.
Actually, what I view as a “limitation” today was the very thing that attracted me to his book, and to data mining, in the first place: The accessibility of the writing, and the simplicity of the method. It all seems obvious to me now, but back then, when the concepts were new, I needed to read the book through several times in order to completely get it. When I got it, I was able to describe it, and to convince others in my organization that we needed to do it. Had the concepts been more abstruse, I would not have learned anything. Even if I had, I would have had limited success selling data mining to anyone. In fact, Wylie’s book laid the foundation for everything else that came after.
These are the main limitations, however, as I see them today:
1. Too few score levels
If you’ve got six predictor variables, you’ll end up with seven score levels (counting zero as a possible score). If you want to segment the pools in your Annual Fund, that might be adequate, but just barely. You could introduce more variables, of course, but there are a couple of things standing in your way. First, very few predictor variables will pass the foolproof threefold test that Wylie prescribes in his book; in order to boost the number of usable variables you may have to bend the “rules” and accept a predictor even if it does not pass all three tests. However, there is a limit to how many variables you can accept: You will quickly cause problems for yourself due to limitation number 2 …
2. Subjective weightings
In the simple-score method, all positive predictors have a score value of 1, and all negative predictors have a value of -1. In other words, it assumes that all predictors are of equal value. We know that this cannot be true, but ignore it for the sake of simplicity. The foolproof nature of the tests we run for choosing our predictors does protect us from accepting a trivial predictor into the model, but the risks we run are twofold: One, we end up over- or under-counting the significance of individual predictors, and two, we may end up double-counting certain effects.
For example, the two variables “employer present” and “job title present” are obviously closely related, but both might pass the three tests with flying colours. They aren’t perfectly alike, so should we use them both in the model? Knowing the data, we would probably choose to use only one of the two. But what about other related variables that aren’t so obvious? “Marital status is single” will be highly correlated with “Class Year”, and “Job title present” will also be correlated with “Business phone present.” The simple-score method offers no way to account for these interactions among variables.
3. Limited fun for the data geek
There’s a bit of the thrill of the hunt in identifying some cool new predictor variable. A person can get quite creative in coming up with new ideas for things to test. (In a previous post, I listed 85 potential predictor variables I have tested for use in my models.) As I said, the simple-score method keeps one mostly out of trouble by disallowing variables that aren’t obviously predictive, but the limitation can get a little boring for the intrepid data explorer. Yes, the core variables that do three-quarters of the predictive work are limited to perhaps eight or ten, the same ones you’ll make use of in a simple-score model; but there is real fun is squeezing out just a little more insight by discovering those subtler variables hiding in your data.
This is not a judgment on the book — only a reminder that more advanced techniques lie beyond. After all, it was Peter Wylie (and John Sammis) who taught me how to use multiple linear regression to create the kinds of models I wanted. If the simple-score method answers your needs right now, as it did for me years ago, then use it!