My Phonathon program hires about thirty students a year. These are mature, reliable employees whom I’d recommend to any prospective future employer. They’re also, well, young. When I was in university, many of them hadn’t even been born.
So, yeah, they’re different from me. They’re different in terms of girth, taste in music and facility with pop-culture references. And they’re different in the data.
Grads who are just beginning their careers as alumni will lack most of the engagement-related attributes we usually rely on for predictive models: event attendance, volunteer activity, employment updates, a business phone. Therefore, variables that relate to their recent student experience are likely to loom larger for them than for their older counterparts. At the same time, recent grads tend to have a richer variety of data in their records, as database usage has increased across the enterprise through the years.
These two differences mark young alumni as a distinct population: One, differences in the distribution of variables that all alumni share, and two, the existence of variables that only younger alumni can have.
It makes me wonder why I’m still lumping young alumni in with older alumni in my predictive models. You might recall that a while ago I was bragging about how well my Phonathon model worked to predict propensity to give in response to phone solicitation. I also mentioned that, unfortunately, the model under-performed in predicting acquisition of young donors.
Okay, it didn’t under-perform — it failed. I concluded that young alumni need their own, separate model.
Where do we draw the line for “young alumni”? One possibility is that we go with our program’s definition of young alums — for me, that’s anyone who has earned a degree in any of the past three years and is under 35. Others might use graduates of the last decade.
This might be fine, but keep in mind that the training sample in a predictive model doesn’t have to follow the strict definition of the population that the appeal is targeting. We need a critical mass of donors in our sample population in order to train the model, therefore we might be more successful if we drew a larger, more loosely-defined sample. Our sample will include some alumni who are slightly older than the alumni who will get the “young alum” appeal — that’s okay, because they’re in the sample for only one reason: training the model.
However you draw the line, the distinction rests on the answer to this question: Is the data that describes one group different from the data that describes another? They may all be alumni, but can they also be thought of as separate populations, in terms of the data that was collected on them?
If you audit the data in certain tables, you might be able to find an “information bump”. That’s what I call the approximate year in which an institution started collecting and storing a lot more information on incoming students. In the data I’m familiar with, that bump has occured in the last ten to 15 years.
One of the most noticeable areas where data recording has increased is in personal information. Nowadays you can find Social Security Number (or in Canada, Social Insurance Number), religion, ethnicity, next-of-kin information, citizenship, driver’s license status, even eye and hair colour. Auditing these fields will tell you when data collection was ramped up, but probably won’t yield many useful predictors as they don’t have much to do with engagement. Certain types of personal information may also be off limits to you.
Investigate personal information if you can, but be sure to look around for other, more relevant data. Some examples:
- Whether they lived in residence — If you don’t have direct access to this, the answer might be lurking in the alum’s past address data.
- Athletics involvement — Count of activities, or a yes/no indicator.
- Club and society activities — Count of activities, or a yes/no indicator.
- Greek society membership — Yes/no.
- Whether they were transfer students or received all of their degree credits from your institution
- Whether they were employed on campus while a student
- Whether they were recipients of awards, prizes, scholarships or bursaries
- Whether they signed up for Email for Life, or otherwise kept their university email address or other university login active — In my data, more than 98% of the most recent grad class has an active university login. That drops to about 84% for the grad class of 2010, then 38% for 2009. The percentages continue to fall gradually from there. This attrition effect might hide the fact that retaining a student login past graduation is a strong indicator of affinity. I will write more on this topic in a future post.
- Online community membership or activity
Oh, and don’t ignore the usual variables, such as marital status! In any conventional predictive model I’ve ever worked on, having a marital status of “single” in the database was a strong negative predictor of giving. But when I reduced my sample to graduates from the past ten years who were no older than 35, I was surprised to see that predictor turn into a strong positive. Although married alumni were still more likely to give, the “singles” were right behind them — and far ahead of the alumni for whom the marital status was missing. In my new model, I will use both “married” and “single” as predictors. Although the marrieds are more likely to be donors, there are relatively few of them; being coded single in our database could well prove to be a leading predictor of giving. (You will need to know, of course, why some alums are coded and others not. I’m still investigating.)
When September rolls around, I’ll be another three months older, and there’s nothing I can do about that. At least I’ll know my hard-working callers will be well-focused, talking to the recent grads who are most ready to make their very first gift to the Annual Fund.