I see this question every once in a while: What variables are most predictive?
Over the next few posts I’ll tell you what I’ve found works for Annual Giving, Planned Giving, and event attendance likelihood. I’ll start with predictors of giving for the phone portion of Annual Giving at our institution.
But first I should point out that the headline of this post is misleading. There is no magic list of predictors that work everywhere and always. You have to explore your own data to find your own “top 15”. And there’s nothing magic about 15, either.
Of course, some variables such as ‘class year’ and ‘home phone present’ will almost always be important, but in working with one’s own data there is no need to rely on assumptions. The interplay of variables that lends predictive power to a model is complex and dynamic. The only safe assumption is that every database is unique, and every new model ought to begin with wide-open exploration of the data and testing of potential predictors.
With that out of the way, here we go:
For this year’s Annual Giving predictive model, the outcome (or ‘dependent’) variable was defined as the sum of all gifts that came to us as a result of phone solicitation, from fiscal years 2004 to 2009. All values in the outcome variable that exceeded $10,000 were recoded to $10,000, to reduce the influence of a handful of big donors. (Technical note: As well, ‘Giving’ was re-expressed by taking a logarithm of the value, or of 1 if the value was $0, since you can’t take a log of 0. Why you might want to re-express the dependent variable before beginning regression analysis is a topic for another day.)
The most predictive variables (roughly in order of influence) are listed below. Variables that have a negative correlation with ‘Giving’ are noted N. (That is, as the value of the variable increases, ‘Giving’ tends to decrease.) Note that very few of these variables can be considered continuous (eg. Class Year) or ordinal (survey scale responses). Most are binary (0/1). But ALL are numeric, as required for regression.
- Class year (N)
- Alumni survey response to scale statement, “I enjoy speaking with students who call as part of the annual calling program.”
- Marital status Single (N)
- Employer present (i.e. in database)
- Earned a degree
- Number of previous phone refusals (N)
- Number of Homecomings attended
- Exclusion code, affinity programs (N)
- Lived primarily in residence while a student (from alumni survey)
- Business phone present
- Number of President’s Receptions attended
- Name prefix is “Dr.”
- Number of student activities
- Number of cross-references in database
There were about a dozen more variables besides these, but they get somewhat arcane and do not contribute very much to the model. It’s far better to have six or eight highly-correlated variables than 40 trivial ones – if only for the time saved in acquiring and preparing the data.
Oh, and notice that I’ve listed only 14, instead of the promised 15! That’s a reminder to take this ranking, and the variables themselves, with a grain of salt.