CoolData blog

2 March 2010

Fun, creative and lesser-known predictive variables

Filed under: Alumni, Predictor variables — Tags: , , , — kevinmacdonell @ 12:58 pm

Your next predictive variable will be found here. (Creative Commons license. Click for source)

University offices record all kinds of things in their databases simply in order to run their own processes: mailing the alumni magazine, ticketing for events, coding mailing preferences and on and on. Finding novel predictors for your models requires talking to colleagues in your department (and around campus) about the database screens they use, and the things they track. Exploring these avenues can be rewarding and rather social as well!

Here are a few variables I’ve tested which might be lesser-known than the ones I’ve written about earlier. These aren’t likely to appear near the top of your list of variables that are most highly correlated with giving, but it certainly won’t hurt to throw some into a regression analysis. Some variables will be more or less valuable depending on what you’re trying to predict. Some of these are negative predictors; that’s hardly a bad thing, as negative predictors will help to further differentiate the prospect pool, allowing your best prospects to stand out from the crowd.

Here we go:

Does your institution have a records researcher? When mail is returned as undeliverable to the alumni office, this person is busy coding alumni as “lost”, which marks them for later research. These codes may persist in your database after the alum is found, or they might be replaced with another code. In either case, I’ve found that alumni who allow themselves to become lost are less likely to give. A great negative predictor.

Does your alumni magazine have a “green delivery” option? Some alumni opt to access their magazine exclusively by electronic means, as a PDF download perhaps. Mailing preferences are tracked in your database, and often any sort of stated preference is a predictor.

You may already be using ‘number of phonathon refusals’ as a variable, but does your calling program record the reasons for refusal? “Financial reasons” might be a negative predictor, but not all reasons have to be negative. I’ve found that alumni who refuse because they want to handle the donation on their own (for example, mail a cheque when and if they feel like it) are excellent donors. They’re just rather phone-averse.

What about cross-references? We record family relationships among alumni – even grandparent/grandchild and in-laws. I’ve found ‘number of cross-references’ to be a significant predictor.

Alumni who want to be excluded from affinity programs (credit cards, insurance etc.) may be coded in your database so they do not receive unwanted mailings for those products. A negative predictor.

There might be a weird variable or two lurking in people’s names. For certain models, I’ve found that having a first or middle name that consists of a single initial is a positive predictor. This is somewhat correlated with age, but even after adding ‘class year’ to my regression, this variable will still improve the fit of the model. As well, Peter Wylie has written about the character length of an entire name (Prefix, First, Middle, Last, Suffix) being a predictor. Try it.

A year or so ago, I figured out how to query the database to easily retrieve the number of address updates for each alum. This only works when your records personnel create a new address record every time, instead of replacing the previous record. If an alum keeps their alma mater informed of their whereabouts, they’re probably more engaged – and more likely to give (and attend events). Ditto for number of phone updates and number of employment updates.

The previous idea is related to “class notes” for the alumni magazine. Some universities enter alumni submissions into their database so they can run their notes as a report. We don’t, but I wish we did, because I know ‘number of notes’ would be a predictor.

This might be the tip of the iceberg. Think of all the other great sources of variables that result from normal daily processes (gift processing data, online social networking data, automated call centre data, survey data …), have those conversations with your colleagues, and figure out how to get your hands on those variables for testing.



  1. Interesting post…the challenge is finding ways to query based on each predictive variable and then solicit them accordingly to see if your predictive variable was in fact, predictive.

    Comment by Greg — 2 March 2010 @ 6:37 pm

    • Well, I don’t know about that. Sounds like a lot of work! The challenge is, I think, to show that the variable is correlated with giving (or with whatever outcome you’re trying to predict). I can see testing the model (or models) as a whole via solicitation of randomly-chosen batches of constituents. But individual variables? Hmmm.

      But yes, designing queries and getting your data ready for model-building is certainly the time-consuming part of the process.

      Comment by kevinmacdonell — 2 March 2010 @ 6:52 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at

%d bloggers like this: