CoolData blog

23 April 2010

The big list: 85 predictor variables for alumni models

Filed under: Model building, Predictor variables, regression — Tags: , — kevinmacdonell @ 10:06 am

Here is my attempt at compiling an exhaustive list of every predictor variable I have ever tested in the environment of my data – 85 of them! Not every variable is listed separately – some are grouped together by type or source. In some cases I’ve indicated whether the variable is an indicator variable (0/1) or a continuous variable, as necessary. A few variables are peculiar to the institution where I created my models. Variables that came from external sources are marked with an asterisk.

Some of these predictors were never used in a model because they were eclipsed by other, related variables that had stronger correlation with the dependent variable. Others (such as gender) proved problematic and were left out of my models for specific reasons. And some were tested and found not to be predictive at all. (A final model may contain only 15 to 20 good predictor variables.) Still, I include them all here, because any one of them might add value to models you build for your own data.

Also note: A number of predictors, listed at the end, are based on giving history. These are NOT to be used when your predicted value is ‘giving’. These variables were used in other models, such as Planned Giving potential and likelihood to attend events.

  • Class year
  • Earned a degree / Did not earn a degree
  • Number of degrees earned
  • Faculty is Education
  • Faculty is Business
  • Faculty is Arts
  • Faculty is Science
  • Spouse name present
  • Spouse is an alum
  • Spouse has giving (0/1)
  • Spouse lifetime giving (continuous)
  • Student activities present (0/1), eg. athletics, etc.
  • Number of student activities (continuous)
  • Religion present (0/1)
  • Religion is Roman Catholic (0/1)
  • Number of refusals to pledge
  • Refusal reason ‘will handle donation ourselves’
  • Requested to be excluded from affinity programs
  • Requested to be excluded from phone solicitation
  • Preferred address type is ‘Business’
  • Seasonal address present
  • Number of address updates
  • Address is in U.S.A.
  • Address is international
  • Province is Nova Scotia [also tested variables for other provinces]
  • Postal code is rural
  • Postal code is urban
  • Variables based on specific PSYTE cluster codes*
  • Has ‘Found’ code (i.e. records researcher has had to locate alum marked lost)
  • Prefers to read alumni magazine online (‘Green’ option)
  • Home phone number present
  • Business phone number present
  • Mobile phone number present
  • Seasonal phone number present
  • Number of phone updates
  • Home phone number is on Canada’s National Do Not Call Registry*
  • Email present
  • Number of email updates
  • Gender
  • Female-widowed
  • Female-married
  • Marital status ‘married’
  • Marital status ‘single’
  • Marital status ‘widow’
  • Marital status ‘divorced’
  • Marital status – other
  • Name prefix is “Dr.”
  • Name prefix is “Rev.” (or other religious)
  • Name prefix is Hon., Justice, or similar
  • Length of entire name
  • Nickname present
  • First name is single initial
  • Middle name is single initial
  • Suffix present
  • Cross-references present (0/1)
  • Number of cross-references (continuous)
  • Has attended Homecoming (0/1)
  • Number of Homecomings attended (continuous)
  • Number of President’s Receptions attended
  • Position (i.e. job title) present
  • Employer present
  • Number of employment updates
  • Employment status present
  • Employment status is ‘retired’
  • ID number begins with ‘F’ (faculty)
  • Registered as a member of the alumni online community
  • Participated in Alumni Engagement Benchmarking Survey* (0/1)
  • Engagement Survey score (continuous)*
  • [Numerous variables created from specific Engagement survey questions, including the following specific ones]
  • Lived primarily in residence while a student [survey]
  • Received a scholarship or bursary [survey]
  • Number of children under 18 [survey]
  • Enjoys speaking with student callers for Phonathon [survey]
  • Likely to attend Homecoming [survey]
  • Likely to attend an event in their area [survey]
  • Holds degrees from other universities [survey]
  • Number of close family members who are also alumni [survey]
  • Span of giving (last year of giving minus first year of giving)
  • Frequency of giving (gifts per year during span of giving)
  • Number of years in which gifts were made
  • Lifetime giving
  • Number of gifts
  • Recency: Gave in past year
  • Recency: Gave at least once in past two years
  • Recency: Gave at least once in past three years

Every year I discover new data points hiding in our database. Many other variables are out there, but often the data exists only for our youngest alumni. Someday, I’m sure, this additional data will yield cool new predictors. For ideas on other variables to look for in your data (including non-university data), refer to the list that begins on page 138 of Joshua Birkholz’s book, “Fundraising Analytics.”



  1. Hi Kevin,

    Love your list! I’m curious about 3 you listed:

    How did you determine a zip code to be rural vs. urban? It’s a neat idea. I’m thinking we could do it with our data using MSAs, aka if a zip is not in a MSA, it is rural. But I’m curious how you did it.

    Also, we use Claritas Prizm segments. Are PSYTE clusters similar? What do you think of them?

    How did you determine religion of Roman Catholic?

    Comment by Michelle Paladino — 26 April 2010 @ 9:49 am

    • Hi Michelle,

      1. I’m in Canada, so we use postal codes. If the second character in the code is a zero, it’s a rural post office. If some other digit, it’s in a town or city. I don’t know a great deal about zip codes, but I’m guessing there is good data to be had – probably not for free, though?

      2. As I understand it, yes, Prizm segments are the same idea as PSYTE clusters. I’ve never made direct use of them, but I happened to find an old data set which had our whole database tagged with these codes, and some of them were fairly predictive. I’m skeptical about using them on their own. In Canada, I’m guessing they’re based entirely on creative use of census data.

      3. The religion data is gathered by our Registrar’s Office; I’m not sure why. We have the data only for our youngest alumni, so as I recall, this was not a very useful predictor. I singled out Roman Catholic because our institution has had a historical connection with the Church and still identifies with it in many ways, and thought perhaps there might be some affinity there. The results were inconclusive. A model built especially for the youngest quartile of alumni might be able to make use of these newer categories of data.

      Comment by kevinmacdonell — 26 April 2010 @ 2:23 pm

  2. […] Predictor variables — kevinmacdonell @ 6:26 am In April last year I published a post called The big list: 85 predictor variables for alumni models. Since then I’ve added new ideas for predictor variables, some from sources I hadn’t […]

    Pingback by The really big list: 100 variables for higher-ed predictive models « CoolData blog — 1 February 2011 @ 6:26 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: