CoolData blog

3 October 2016

Grad class size: predictive of giving, but a reality check, too


The idea came up in a conversation recently: Certain decades, it seems, produced graduates that have reduced levels of alumni engagement and lower participation rates in the Annual Fund. Can we hope they will start giving when they get older, like alumni who have gone before? Or is this depressed engagement a product of their student experience — a more or less permanent condition that will keep them from ever volunteering or giving?


The answer is not perfectly clear, but what I have found with a bit of analysis can only add to the concern we all have about the end of “business as usual.”


For almost all universities, enrolments have risen dramatically over the decades since the end of the second World War. As undergraduate class sizes ballooned, metrics such as the student-professor ratio emerged as important indicators of quality of education. It occurred to me to calculate the size of each grad-year cohort and include it as a variable in predictive models. For a student who graduated in 1930, that figure could be 500. For someone who graduated in 1995, it might be 3,000. (If you do this, remember not to exclude now-deceased alumni in your count.) A rough generalization about the conditions under which a person received their degree, to be sure, but it was easy to query the database for this, and easy to test.


I pulled lifetime giving for 130,000 living alumni and log-transformed it before checking for a correlation with the size of graduating class. (The transformation being log of “lifetime giving plus 1.”) It turned out that lifetime giving has a strong inverse correlation with the size of an alum’s grad class, for that alum’s most recent degree. (r = -0.338)


This is not surprising. The larger the graduating class, the younger the alum. Nothing is as strongly correlated with lifetime giving as age, therefore much of the effect I was seeing was probably due to age. (The Pearson correlation of LTG and age was 0.395.)


Indeed, in a multiple linear regression of age on lifetime giving (log-transformed), adding “grad-class size” as a predictor variable does not improve model fit. The two predictors are not independent of each other: For age and grad-class size, r = -0.828!


I wasn’t ready to give up on the idea, though. I considered my own graduation from university, and all the convocations I had attended in the past as an Advancement employee or a family member of a graduate. The room (or arena, as the case may be) was full of grads from a whole host of degree programs, most of whom had never met each other or attended any class in common. Enrolment growth has been far from even across faculties (or colleges or schools); the student experience in terms of class size and one-on-one access to professors probably differs greatly from program to program. At most universities, Arts or Science faculties have exploded in size, while Medicine or Law have probably not.


With that in mind, I calculated grad-class size differently, counting the size of each alum’s graduating cohort at the faculty (college) level. The correlation of this more granular count of grads with lifetime giving was not as negative (r = -0.283), but at the same time, it was less tied to age.


This time, when I created a regression of age on lifetime giving and then added grad-class size at the faculty level, both predictors were significant. Grad class size gave a good boost to adjusted R squared.


I seemed to be on to something, so I pushed it farther. Knowing that an undergrad’s experience is very different from that of a graduate student, I added “Number of Degrees” as a variable after age, and before grad-class size. All three predictors were significant and all led to improvements in model fit.


Still on the trail of how class size might affect student experience, and alumni affinity and giving thereafter, I got more specific in my query, counting the number of graduates in each alum’s year of graduation and degree program. This variable was even less conflated with age, but despite that, it failed to provide any additional explanation for the variation in lifetime giving. There may be other forms of counts that are more predictive, but the best I found was size of grad class at the faculty/college level.


If I were asked to speculate about the underlying cause, the narrative I’d come up with is that enrolments grew dramatically not only because there were more young people, but because universities in North America were attracting students who increasingly felt that a university degree was a rite of passage required for success in the job market. The relationship of student to university was changing, from that of a close-knit club of scholars, many of whom felt immensely grateful for the opportunity, to a much larger, less cohesive population with a more transactional view of their relationship with alma mater.


That attitude (“I paid x dollars for my piece of paper and so our business here is done”), and not so much the increasing numbers of students they shared the lecture halls with, could account for drops in philanthropic support. What that means for Annual Fund is that we can’t bank on the likelihood that a majority of alumni will become nostalgic when they reach the magic age of 50 or 60 and open their wallets as a consequence. Everything’s different now.


I don’t imagine this is news to anyone who’s been paying attention. But it’s interesting to see how this reality is reflected in the data. And it’s in the data that we will be able to find the alumni for whom university was not just a transaction. Our task today is not just to identify that valuable minority, but to understand them, communicate with them intelligently, connect with their interests and passions, and engage them in meaningful interactions with the institution.


2 August 2016

Data Down Under, and the real reason we measure alumni engagement

Filed under: Alumni, Dalhousie University, engagement, Training / Professional Development — Tags: — kevinmacdonell @ 4:00 pm


coverI’ve given presentations here and there around Canada and the U.S., but I’ve never travelled THIS far. On Aug. 24, I will present a workshop in Sydney, Australia — a one-day master class for CASE Asia-Pacific on using data to measuring alumni engagement. My wife and I will be taking some time to see some of that beautiful country, leaving in just a few days.


The workshop attendees will be alumni relations professionals from institutions large and small, and in the interest of keeping the audience’s needs in mind, I hope to convince them that measuring engagement is worth doing by talking about what’s in it for them.


This will be the easy part. Figuring out how to quantify engagement will allow them to demonstrate the value of their teams’ activity to the university, using language their senior leadership understands. Scoring can also help alumni teams better target segments based on varying levels of engagement, evaluate current alumni programming, and focus on activities that yield the greatest boost in engagement.


There is a related but larger context for this discussion, however. I am not certain that everyone will be keen to hear about it.


Here’s the situation. Everything in alumni relations is changing. Alumni populations are growing, the number of donors is decreasing, and traditional engagement methods are less effective. Friend-raising and “one size fits all” approaches to engagement are increasingly seen as unsustainable wastes of resources. (A Washington, DC based consultancy, the Education Advisory Board, makes this point very well in this excerpt of a report which you can download here: The Strategic Alumni Relations Enterprise.)


I don’t know so much about the Asia-Pacific region, but in North America university leaders are questioning the very purpose and value of typical alumni relations activities. In this scenario, engagement measurement is intended for more than producing a merely informational report or having something to brag about: Engagement measurement is really a tool that enables alumni relations to better align itself with the Advancement mission.


In place of “one size fits all,” alumni relations teams are under pressure to understand how to interact with alumni at different levels of engagement. Alumni who are somewhat engaged should be targeted with relevant programs and messages to bring them to the next level, while alumni who are at the lowest levels of engagement should not have significant resources directed at them.


Alumni at high levels of engagement, however, require special and customized treatment. They’re looking for deeper and more fulfilling experiences that involve furthering the mission of the institution itself. Think of guest lecturing, student recruitment, advisory board roles, and mentorship, career development and networking for students and new grads. Low-impact activities such as pub nights and other social events are a waste of the potential of this group and will fail to move them to continue contributing their time and money.


Think of what providing these quality experiences will entail. For one, alumni relations staff will have to collaborate with their colleagues in development, as well as in other offices across campus — enrolment management, career services, and academic offices. This will be a new thing, and perhaps not an easy thing, for alumni relations teams stuck in traditional friend-raising mode and working in isolation.


But it’s exactly through these strategic partnerships that alumni relations can prove its value to the whole institution and attract additional resources even in an environment where leaders are demanding to know the ROI of everything.


Along with better integration, a key element of this evolution will be robust engagement scoring. According to research conducted by the Education Advisory Board, alumni relations does the poorest job of any office on campus in providing hard data on its real contribution to the university’s mission. Too many of us are still stuck on tracking our activities instead of the results of those activities.


It doesn’t have to be that way, if the alumni team can effectively partner with other units in Advancement. For those of us on the data, reporting, and analysis side of the house, get ready: The alumni team is coming.


13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.


The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:


If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

16 July 2013

Alumni engagement scoring vs. predictive modelling

Filed under: Alumni, engagement, predictive modeling — Tags: , , , — kevinmacdonell @ 8:06 am

Alumni engagement scoring has an undeniable appeal. What could be simpler? Just add up how many events an alum has attended, add more points for volunteering, add more points for supporting the Annual Fund, and maybe some points for other factors that seem related to engagement, and there you have your score. If you want to get more sophisticated, you can try weighting each score input, but generally engagement scoring doesn’t involve any advanced statistics and is easily grasped.

Not so with predictive modelling, which does involve advanced stats and isn’t nearly as intuitive; often it’s not possible to really say how an input variable is related to the outcome. It’s tempting, too, to think of an engagement score as being a predictor of giving and therefore a good replacement for modelling. Actually, it should be predictive — if it isn’t, your score is not measuring the right things — but an engagement score is not the same thing as a predictive model score. They are different tools for different jobs.

Not only are engagement scoring schemes different from predictive models, their simplicity is deceptive. Engagement scoring is incomplete without some plan for acting on observed trends with targeted programming. This implies the ability to establish causal drivers of engagement, which is a tricky thing.

That’s a sequence of events — not a one-time thing. In fact, engagement scoring is like checking the temperature at regular intervals over a long period of time, looking for up and down trends not just for the group as a whole but via comparisons of important subgroups defined by age, sex, class year, college, degree program, geography or other divisions. This requires discipline: taking measurements in exactly the same way every year (or quarter, or what-have-you). If the score is fed by a survey component, you must survey constantly and consistently.

Predictive models and engagement scores have some surface similarities. They share variables in common, the output of both is a numerical score applied to every individual, and both require database work and math in order to calculate them. Beyond that, however, they are built in different ways and for different purposes. To summarize:

  • Predictive models are collections of potentially dozens of database variables weighted according to strength of correlation with a well-defined behaviour one is trying to predict (eg. making a gift), in order to rank individuals by likelihood to engage in that behaviour. Both Alumni Relations and Development can benefit from the use of predictive models.
  • Engagement scores are collections of a very few selectively-chosen database variables, either not weighted or weighted according to common sense and intuition, in order to roughly quantify the quality of “engagement”, however one wishes to define that term, for each individual. The purpose is to allow comparison of groups (faculties, age bands, geographical regions, etc.) with each other. Comparisons may be made at one point in time, but it is more useful to compare relative changes over time. The main user of scores is Alumni Relations, in order to identify segments requiring targeted programming, for example, and to assess the impact of programming on targeted segments over time.

Let’s explore key differences in more depth:

The purpose of modelling is prediction, for ranking or segmentation. The purpose of engagement scoring is comparison.

Predictive modelling scores are not usually included in reports. Used immediately in decision making, they may never be seen by more than one or two people. Engagement scores are included in reports and dashboards, and influence decision-making over a long span of time.

The target variable of a predictive model is quantifiable (eg. giving, measurable in dollars). In engagement scoring, there is no target variable, only an output – a construct called “engagement”, which itself is not directly measurable.

Potential input variables for predictive models are numerous (100+) and vary from model to model. Input variables for engagement scores are limited to a handful of easily measured attributes (giving, event attendance, volunteering) which must remain consistent over time.

Variables for predictive models are chosen primarily using statistical methods (correlation) and only secondarily using judgment and “common sense.” For example, if the presence of a business phone number is highly correlated with being a donor, it may be included in the model. For engagement scores, variables are chosen by consensus of stakeholders, primarily according to subjective standards. For example, event attendance and giving would probably be deemed by the committee to indicate engagement, and would therefore be included in the score. Advanced statistics rarely come into play. (For more thoughts on this, read How you measure alumni engagement is up to you.)

In predictive models, giving and variables related to the activity of giving are usually excluded as variables (if ‘giving’ is what we are trying to predict). Using any aspect of the target variable as an input is bad practice in predictive modelling and is carefully avoided. You wouldn’t, for example, use attendance at a donor recognition event to predict likelihood to give. In engagement scoring, though, giving history is usually a key input, as it is common sense to believe that being a donor is an indication of engagement. (It might be excluded or reported separately if the aim is to demonstrate the causal link between engagement indicators and giving.)

Modelling variables are weighted using multiple linear regression or other statistical method which calculates the relative influence of each variable while simultaneously controlling for the influence of all other variables in the model. Engagement score variables are usually weighted according to gut feel. For example, coming to campus for Homecoming seems to carry more weight than showing up for a pub night in one’s own city, therefore we give it more weight.

The quality of a predictive model is testable, first against a validation data set, and later against actual results. But there is no right or wrong way to estimate engagement, therefore the quality of scores cannot be evaluated conclusively.

The variables in a predictive model have complex relationships with each other that are difficult or impossible to explain except very generally. Usually there is no reason to explain a model in detail. The components in an engagement score, on the other hand, have plausible (although not verifiable) connections to engagement. For example, volunteering is indicative of engagement, while Name Prefix is irrelevant.

Predictive models are built for a single, time-limited purpose and then thrown away. They evolve iteratively and are ever-changing. On the other hand, once established, the method for calculating an engagement score must not change if comparisons are to be made over time. Consistency is key.

Which is all to say: alumni engagement scoring is not predictive modelling. (And neither is RFM analysis.) Only predictive modelling is predictive modelling.

Blog at