I see this question every once in a while: **What variables are most predictive?**

Over the next few posts I’ll tell you what I’ve found works for Annual Giving, Planned Giving, and event attendance likelihood. I’ll start with predictors of giving for the phone portion of Annual Giving at our institution.

But first I should point out that the headline of this post is misleading. There is **no magic list of predictors** that work everywhere and always. You have to explore your own data to find your own “top 15”. And there’s nothing magic about 15, either.

Of course, some variables such as ‘class year’ and ‘home phone present’ will almost always be important, but in working with one’s own data there is no need to rely on assumptions. The interplay of variables that lends predictive power to a model is complex and dynamic. The only safe assumption is that every database is unique, and every new model ought to begin with wide-open exploration of the data and testing of potential predictors.

With that out of the way, here we go:

For this year’s Annual Giving predictive model, the outcome (or ‘dependent’) variable was defined as the sum of all gifts that came to us as a result of phone solicitation, from fiscal years 2004 to 2009. All values in the outcome variable that exceeded $10,000 were recoded to $10,000, to reduce the influence of a handful of big donors. (Technical note: As well, ‘Giving’ was re-expressed by taking a logarithm of the value, or of 1 if the value was $0, since you can’t take a log of 0. Why you might want to re-express the dependent variable before beginning regression analysis is a topic for another day.)

## The list

The most predictive variables (roughly in order of influence) are listed below. Variables that have a negative correlation with ‘Giving’ are noted **N**. (That is, as the value of the variable increases, ‘Giving’ tends to decrease.) Note that very few of these variables can be considered continuous (eg. Class Year) or ordinal (survey scale responses). Most are binary (0/1). But ALL are **numeric**, as required for regression.

- Class year (N)
- Alumni survey response to scale statement, “I enjoy speaking with students who call as part of the annual calling program.”
- Marital status Single (N)
- Employer present (i.e. in database)
- Earned a degree
- Number of previous phone refusals (N)
- Number of Homecomings attended
- Exclusion code, affinity programs (N)
- Lived primarily in residence while a student (from alumni survey)
- Business phone present
- Number of President’s Receptions attended
- Name prefix is “Dr.”
- Number of student activities
- Number of cross-references in database

There were about a dozen more variables besides these, but they get somewhat arcane and do not contribute very much to the model. It’s far better to have six or eight highly-correlated variables than 40 trivial ones – if only for the time saved in acquiring and preparing the data.

Oh, and notice that I’ve listed only 14, instead of the promised 15! That’s a reminder to take this ranking, and the variables themselves, with a grain of salt.

Thanks for this post. I’m always interested to read about how others are approaching this.

I’m hoping that you will some day address the reasons for re-expressing the DV as a log. I’ve been searching for a good explanation in this context.

Off to browse more of your blog…

Comment by John — 6 January 2010 @ 3:54 pm

Thanks, yes, I myself am always interested in seeing what variables others use, and I’m excited whenever I find some field in our database that contains data that might be predictive. Soon I will post a big list of potential places to look for variables. (In the meantime, I think Josh Birkholz has just such a list in his book, Fundraising Analytics.)

Regarding the log transformation, I will comment on that when I find a good way of saying it … I might need to do a little research on that one.

Comment by kevinmacdonell — 6 January 2010 @ 4:17 pm

John: I have addressed this question in today’s post:

https://cooldata.wordpress.com/2010/03/04/why-transform-the-dependent-variable/

Comment by kevinmacdonell — 4 March 2010 @ 1:04 pm

[…] This post was mentioned on Twitter by Kevin MacDonell, Robert L. Weiner. Robert L. Weiner said: The 15 top predictors for Annual Giving: http://bit.ly/8mgVHc […]

Pingback by Tweets that mention The 15 top predictors for Annual Giving « CoolData blog -- Topsy.com — 6 January 2010 @ 8:44 pm

[…] a previous post I offered top predictive variables for Annual Giving. Today let’s talk about Planned Giving, which I think is a lot more […]

Pingback by The 15 top predictors for Planned Giving – Part 1 « CoolData blog — 7 January 2010 @ 12:21 pm

[…] variables, log transformation, regression, transformations — kevinmacdonell @ 12:53 pm In a previous post I mentioned in passing that for a particular predictive model using multiple regression, I […]

Pingback by Why transform the dependent variable? « CoolData blog — 4 March 2010 @ 12:53 pm