CoolData blog

3 October 2016

Grad class size: predictive of giving, but a reality check, too


The idea came up in a conversation recently: Certain decades, it seems, produced graduates that have reduced levels of alumni engagement and lower participation rates in the Annual Fund. Can we hope they will start giving when they get older, like alumni who have gone before? Or is this depressed engagement a product of their student experience — a more or less permanent condition that will keep them from ever volunteering or giving?


The answer is not perfectly clear, but what I have found with a bit of analysis can only add to the concern we all have about the end of “business as usual.”


For almost all universities, enrolments have risen dramatically over the decades since the end of the second World War. As undergraduate class sizes ballooned, metrics such as the student-professor ratio emerged as important indicators of quality of education. It occurred to me to calculate the size of each grad-year cohort and include it as a variable in predictive models. For a student who graduated in 1930, that figure could be 500. For someone who graduated in 1995, it might be 3,000. (If you do this, remember not to exclude now-deceased alumni in your count.) A rough generalization about the conditions under which a person received their degree, to be sure, but it was easy to query the database for this, and easy to test.


I pulled lifetime giving for 130,000 living alumni and log-transformed it before checking for a correlation with the size of graduating class. (The transformation being log of “lifetime giving plus 1.”) It turned out that lifetime giving has a strong inverse correlation with the size of an alum’s grad class, for that alum’s most recent degree. (r = -0.338)


This is not surprising. The larger the graduating class, the younger the alum. Nothing is as strongly correlated with lifetime giving as age, therefore much of the effect I was seeing was probably due to age. (The Pearson correlation of LTG and age was 0.395.)


Indeed, in a multiple linear regression of age on lifetime giving (log-transformed), adding “grad-class size” as a predictor variable does not improve model fit. The two predictors are not independent of each other: For age and grad-class size, r = -0.828!


I wasn’t ready to give up on the idea, though. I considered my own graduation from university, and all the convocations I had attended in the past as an Advancement employee or a family member of a graduate. The room (or arena, as the case may be) was full of grads from a whole host of degree programs, most of whom had never met each other or attended any class in common. Enrolment growth has been far from even across faculties (or colleges or schools); the student experience in terms of class size and one-on-one access to professors probably differs greatly from program to program. At most universities, Arts or Science faculties have exploded in size, while Medicine or Law have probably not.


With that in mind, I calculated grad-class size differently, counting the size of each alum’s graduating cohort at the faculty (college) level. The correlation of this more granular count of grads with lifetime giving was not as negative (r = -0.283), but at the same time, it was less tied to age.


This time, when I created a regression of age on lifetime giving and then added grad-class size at the faculty level, both predictors were significant. Grad class size gave a good boost to adjusted R squared.


I seemed to be on to something, so I pushed it farther. Knowing that an undergrad’s experience is very different from that of a graduate student, I added “Number of Degrees” as a variable after age, and before grad-class size. All three predictors were significant and all led to improvements in model fit.


Still on the trail of how class size might affect student experience, and alumni affinity and giving thereafter, I got more specific in my query, counting the number of graduates in each alum’s year of graduation and degree program. This variable was even less conflated with age, but despite that, it failed to provide any additional explanation for the variation in lifetime giving. There may be other forms of counts that are more predictive, but the best I found was size of grad class at the faculty/college level.


If I were asked to speculate about the underlying cause, the narrative I’d come up with is that enrolments grew dramatically not only because there were more young people, but because universities in North America were attracting students who increasingly felt that a university degree was a rite of passage required for success in the job market. The relationship of student to university was changing, from that of a close-knit club of scholars, many of whom felt immensely grateful for the opportunity, to a much larger, less cohesive population with a more transactional view of their relationship with alma mater.


That attitude (“I paid x dollars for my piece of paper and so our business here is done”), and not so much the increasing numbers of students they shared the lecture halls with, could account for drops in philanthropic support. What that means for Annual Fund is that we can’t bank on the likelihood that a majority of alumni will become nostalgic when they reach the magic age of 50 or 60 and open their wallets as a consequence. Everything’s different now.


I don’t imagine this is news to anyone who’s been paying attention. But it’s interesting to see how this reality is reflected in the data. And it’s in the data that we will be able to find the alumni for whom university was not just a transaction. Our task today is not just to identify that valuable minority, but to understand them, communicate with them intelligently, connect with their interests and passions, and engage them in meaningful interactions with the institution.


2 August 2016

Data Down Under, and the real reason we measure alumni engagement

Filed under: Alumni, Dalhousie University, engagement, Training / Professional Development — Tags: — kevinmacdonell @ 4:00 pm


coverI’ve given presentations here and there around Canada and the U.S., but I’ve never travelled THIS far. On Aug. 24, I will present a workshop in Sydney, Australia — a one-day master class for CASE Asia-Pacific on using data to measuring alumni engagement. My wife and I will be taking some time to see some of that beautiful country, leaving in just a few days.


The workshop attendees will be alumni relations professionals from institutions large and small, and in the interest of keeping the audience’s needs in mind, I hope to convince them that measuring engagement is worth doing by talking about what’s in it for them.


This will be the easy part. Figuring out how to quantify engagement will allow them to demonstrate the value of their teams’ activity to the university, using language their senior leadership understands. Scoring can also help alumni teams better target segments based on varying levels of engagement, evaluate current alumni programming, and focus on activities that yield the greatest boost in engagement.


There is a related but larger context for this discussion, however. I am not certain that everyone will be keen to hear about it.


Here’s the situation. Everything in alumni relations is changing. Alumni populations are growing, the number of donors is decreasing, and traditional engagement methods are less effective. Friend-raising and “one size fits all” approaches to engagement are increasingly seen as unsustainable wastes of resources. (A Washington, DC based consultancy, the Education Advisory Board, makes this point very well in this excerpt of a report which you can download here: The Strategic Alumni Relations Enterprise.)


I don’t know so much about the Asia-Pacific region, but in North America university leaders are questioning the very purpose and value of typical alumni relations activities. In this scenario, engagement measurement is intended for more than producing a merely informational report or having something to brag about: Engagement measurement is really a tool that enables alumni relations to better align itself with the Advancement mission.


In place of “one size fits all,” alumni relations teams are under pressure to understand how to interact with alumni at different levels of engagement. Alumni who are somewhat engaged should be targeted with relevant programs and messages to bring them to the next level, while alumni who are at the lowest levels of engagement should not have significant resources directed at them.


Alumni at high levels of engagement, however, require special and customized treatment. They’re looking for deeper and more fulfilling experiences that involve furthering the mission of the institution itself. Think of guest lecturing, student recruitment, advisory board roles, and mentorship, career development and networking for students and new grads. Low-impact activities such as pub nights and other social events are a waste of the potential of this group and will fail to move them to continue contributing their time and money.


Think of what providing these quality experiences will entail. For one, alumni relations staff will have to collaborate with their colleagues in development, as well as in other offices across campus — enrolment management, career services, and academic offices. This will be a new thing, and perhaps not an easy thing, for alumni relations teams stuck in traditional friend-raising mode and working in isolation.


But it’s exactly through these strategic partnerships that alumni relations can prove its value to the whole institution and attract additional resources even in an environment where leaders are demanding to know the ROI of everything.


Along with better integration, a key element of this evolution will be robust engagement scoring. According to research conducted by the Education Advisory Board, alumni relations does the poorest job of any office on campus in providing hard data on its real contribution to the university’s mission. Too many of us are still stuck on tracking our activities instead of the results of those activities.


It doesn’t have to be that way, if the alumni team can effectively partner with other units in Advancement. For those of us on the data, reporting, and analysis side of the house, get ready: The alumni team is coming.


11 May 2015

A new way to look at alumni web survey data

Filed under: Alumni, Surveying, Vendors — Tags: , , , , — kevinmacdonell @ 7:38 pm

Guest post by Peter B. Wylie, with John Sammis


Click to download the PDF file of this discussion paper: A New Way to Look at Survey Data


Web-based surveys of alumni are useful for all sorts of reasons. If you go to the extra trouble of doing some analysis — or push your survey vendor to supply it — you can derive useful insights that could add huge value to your investment in surveying.


This discussion paper by Peter B. Wylie and John Sammis demonstrates a few of the insights that emerge by matching up survey data with some of the plentiful data you have on alums who respond to your survey, as well as those who don’t.


Neither alumni survey vendors nor their higher education clients are doing much work in this area. But as Peter writes, “None of us in advancement can do too much of this kind of analysis.”


Download: A New Way to Look at Survey Data



7 July 2014

Mine your donor data with this baseball-inspired analysis

I’ve got baseball analytics on my mind. I don’t know if it’s because of the onset of July or because of a recent mention of CoolData on Nate Silver’s FiveThirtyEight blog, but I have been deeply absorbed in an analysis of donor giving behaviours inspired by Silver’s book, “The Signal and the Noise.” It might give you some ideas for things to try with your own database.

Back in 2003, Silver designed a system to predict the performance of Major League Baseball players. The system, called PECOTA, attempts to understand how a player’s performance evolves as he ages. As Silver writes in his book, its forecasts were probabilistic, offering a range of possible outcomes for each player. From the previous work of others, Silver was aware that hitters reach their peak performance at age 27, on average. Page 81 of his book shows the “aging curve” for major league hitters, a parabola starting at age 18, arcing smoothly upwards through the 20s, peaking at 27, and then just as smoothly arcing downwards to age 39.

My immediate thought on reading about this was, what about donors? Can we visualize the trajectory of various types of donors (major donors, bequest donors, leadership annual fund donors) from their first ten bucks right after graduating, then on into their peak earning years? What would we learn by doing that?

In baseball, the aging curve presents a problem for teams acquiring players with proven track records. By the time they become free agents, their peak years will have passed. However, if the early exploits of a young prospect cause him to resemble one of the greats from the past, perhaps he is worth investing in. The curve, as Silver notes, is only an average. Some players will peak far earlier, and some far later, than the average age of 27. There are different types of players, and difference types of curves, and every individual career is different. But lurking in all that noise, there is a signal.

Silver’s PECOTA system takes things further than that — I will return to that later — but now I want to turn to how we can visualize a sort of aging curve for our donors.

What’s the payoff? Well, to cut to the chase: It appears that some donors who go on to give in six figures (lifetime total) can be distinguished from the majority of lower-level donors at a very early age. Above-average giving ($200 or $250, say) in any one year during one’s late 20s or early 30s is a predictor of very high lifetime value. I have found that when big donors have started their giving young, they have started big. That is, “big” in relation to their similarly-aged peers – not at thousands of dollars, but at $100 or $500, depending on how old they were at the time.

Call it “precocious giving”.

Granted, it sounds a bit like plain common sense. But being analytical about it opens up the possibility of identifying a donor with high lifetime value when they’re still in their late 30s or early 40s. You can judge for yourself, but the idea appealed to me.

I’m getting ahead of myself. At the start of this, I was only interested in getting the data prepared and plotting the curve to see what it would look like. To the extent that it resembled a curve at all, I figured that “peak age” — the counterpart to baseball player performance — would be the precise age at which a donor gave the most in any given year while they were alive.


I wrote a query to pull all donors from the database (persons only), with a row for each year of giving, summing on total giving in that year. Along with Year of Gift, I pulled down Year of Birth for each donor — excluding anyone for whom we had no birthdate. I included only amounts given while the donor was living; bequests were excluded.

The next step was to calculate the age the donor was at the time of the gift. I added a column to the data, defined as the Year of Gift minus Year of Birth. That gave me a close-enough figure for age at time of giving.

As I worked on the analysis, I kept going back to the query to add things I needed, such as certain donor attributes that I wanted to examine. Here are most of the variables I ended up pulling from the database for each unique combination of Donor ID and Age at Time of Gift:

  • ID
  • Age at Time of Gift (Year of Gift minus Year of Birth)
  • Sum of Giving (total giving for that donor, at that age)
  • Donor Category Code (Alum, Friend, etc.)
  • Total Lifetime Giving (for each donor, without regard to age)
  • Deceased Indicator (are they living or dead as of today)
  • Current Age (if living, how old are they right now)
  • Year of Birth
  • Year of Gift

The result was a data set with more than 200,000 rows. Notice, of course, that a donor ID can appear on multiple rows — one for each value of Age at Gift. The key thing to remember is that I didn’t care what year giving occurred, I only wanted to know how old someone was when they gave. So in my results, a donor who gave in 1963 when she was 42 is much the same as a donor who gave in 2013 he was the same age.


Now it was time to visualize this data, and for that I used Tableau. I connected directly to the database and loaded the data into Tableau using custom SQL. ‘Age at Gift’ is numerical, so Tableau automatically listed that variable in the Measures panel. For this analysis, I wanted to treat it as a category instead, so I dragged it into the Dimensions panel. (If you’re not familiar with Tableau, don’t worry about these application-specific steps — once you get the general idea, you can replicate this using your tool of choice.)

The first (and easiest) thing to visualize was simply the number of donors at each age. Click on the image below to see a full-size version. Every part of the shape of this curve says something interesting, I think, but the one thing I have annotated here is the age at which the largest number of people chose to make a gift.


This chart lumps in everyone — alumni and non-alumni, living donors and deceased donors — so I wanted to go a little deeper. I would expect to see a difference between alumni and non-alumni, for example, so I put all degree and non-degree alumni into one category (Alumni), and all other donor constituents into another (Non-alumni). The curve does not change dramatically, but we can see that the number of non-alumni donors peaks later than the number of alumni donors.


There are a number of reasons for analyzing alumni and non-alumni separately, so from this point on, I decided to exclude non-alumni.

The fact that 46 seems to be an important age is interesting, but this probably says as much about the age composition of our alumni and our fundraising effort over the years as it does about donor behaviour. To get a sense of how this might be true, I divided all alumni donors into quartiles (four bins containing roughly equal numbers of alumni), by Birth Year. Alumni donors broke down this way:

  1. Born 1873 to 1944: 8,101 donors
  2. Born 1945 to 1955: 8,036 donors
  3. Born 1956 to 1966: 8,614 donors
  4. Born 1967 to 1991: 8,172 donors

Clearly these are very different cohorts! The donors in the middle two quartiles were born in a span of only a decade each, while the span of the youngest quartile is 24 years, and the span of the oldest quartile is 71 years! When I charted each age group separately, they split into distinct phases. (Reminder: click on the image for a full-size version.)


This chart highlights a significant problem with visualizing the life cycle of donors: Many of the donors in the data aren’t finished their giving careers yet. When Nate Silver talks about the aging curves of baseball players, he means players whose career is behind them. How else to see their rise, peak, and eventual decline? According to the chart above, the youngest quartile peaks (in terms of number of donors) at age 26. However, most of these donors are still alive and have many years of giving ahead of them. We will turn to them to identify up-and-coming donors, but as long as we are trying to map out what a lifetime of giving looks like, we need to focus on the oldest donors.

An additional problem is that our donor database doesn’t go back as far as baseball stats do. Sure, we’ve got people in the database who were born more than 140 years ago, but our giving records are very sparse for years before the early 1970s. If a donor was very mature at that time, his apparent lack of giving history might cause us to make erroneous observations.

I decided to limit the data set to donors born between 1920 and 1944. This excludes the following donors who are likely to have incomplete giving histories:

  • Anyone who was older than 50 in 1970, when giving records really started to get tracked, and
  • Anyone who is currently younger than 70, and may have many years of giving left.

This is a bit arbitrary, but reasonable. It trims off the donors who could never have had a chance to have a lifetime of giving recorded in the data, without unduly reducing the size of my data set. I was left with only 20% of my original data, but still, that’s more than 6,000 individuals. I could have gotten fussier with this, removing anyone who died at a relatively young age, but I figured the data was good enough to provide some insights.

The dramatic difference made by this trimming is evident in the following two charts. Both charts show a line for the number of donors by age at time of gift, for each of three lifetime giving levels: Under $1,000 in blue, $1,000 to $10,000 in orange, and over $10,000 in purple. What this means is that all the donors represented by the purple line (for example) gave at least $10,000 cumulatively over the course of their lifetime.

The first chart is based on ALL the data, before trimming according to birth year. The second chart is based on the 6,000 or so records I was left with after trimming. The first chart seems to offer an interesting insight: The higher the lifetime value of donors, the later in life they tend to show up in great numbers. But of course this just isn’t true. Although the number of donors with lower lifetime giving peaks at earlier ages, that’s only because that whole group of donors is younger: They’re not done giving yet. (I have added ‘Median Current Age’ to the high point of each curve to illustrate this.) Remember, this chart includes everyone — it’s the “untrimmed” data:

Contrast that three-phase chart with this next one, based on “trimmed” data. The curves are more aligned, presumably because we are now looking at a better-defined cohort of donors (those born 1920 to 1944). The oldest donor is 24 years older than the youngest donor, but that’s okay: The most important concern is having good data for that range of ages. Because the tops of these curves are flatter, I have annotated more points, for the sake of interest.


These curves are pretty, but they aren’t analogous to “performance curves” for baseball players — we haven’t yet looked at how MUCH donors give, on average, at each age. However, what general observations can we make from the last chart? Some that come to my mind:

  • Regardless of what a donor finally ends up giving lifetime, there are always a few (a very few) who start giving while they are in their 20s, and a few who are still around to give when they are in their late 80s and early 90s.
  • The number of donors starts to really take off at around age 40, and there is steady growth until about age 50, when the growth in number of donors begins to slow or plateau.
  • Donors start to drop out rapidly at around age 70. This is due to mortality of course, but probably the steepness of the drop is exaggerated by my trimming of the data at the older end.


Here is where things really get interesting. The whole point of this exercise was to see if we can spot the telltale signs of a future major donor while they are still relatively young, just as a baseball scout looks for young prospects who haven’t peaked yet. Do donors signal unusual generosity even when they are still in their 20s and 30s? Let’s have a look.

I zoomed in on a very small part of the chart, to show giving activity up until age 35. Are there differences between the various levels of donors? You bet there are.

As soon as a high-lifetime-value donor starts to give, the gifts are higher, relative to same-age peers who will end up giving less. The number of donors at these early ages is miniscule, so take this with a grain of salt, but a trend seems unmistakable: Up to the age of 30, donors who will end up giving in five figures and higher give about 2.5 to 3.5 times as much per year as other donors their age who end up giving $1,000 to $10,000 lifetime. AND, they give FIVE TIMES as much per year as other donors their age who end up giving less than $1,000 lifetime.


Later on, at ages 35 and 40, donors who will finish their giving careers at the high end are giving two to three times as much per year as donors in the middle range, and 5.6 to 7 times per year (respectively) as donors who will finish on the lowest end.

It might be less confusing to chart each group of donors by average giving per year, rather than by number of donors. This chart shows average giving per year up until age 65. Naturally, the averages get very spiky, as donors start making large gifts.



To temper the effect of extreme values, I log-transformed the giving amounts. This made it easier to visualize how these three tiers of donors differ from each other over a lifetime of giving:


What do I see from this? These are generalizations based on averages, but potentially useful generalizations:

  • Upper-end donors start strong relative to other donors, accelerate giving after age 40, and continue to increase giving throughout their lifetimes.
  • Middle- and low-range donors start lower. They also increase their yearly giving until their late 40s, but after that, they plateau and stay at the same level for the rest of their lives.


What’s the bottom line here? I think it’s this: Hundreds of donors were well on their way to being exceptional by the tender age of 40, and a few were signaling long before that.

Information like this would be interesting to Annual Fund as they work to identify prospects for leadership-level giving. But $10,000 in a lifetime is a little too low to make the Major Gifts folks take notice. Can we carve out the really big donors from the $10K-plus crowd? And can we also identify them before they hit 40? Have a look at this chart. For this one, I removed all the donors who gave less than $10,000 lifetime, and then I divided the high-end donors into those who gave less than $100,000 lifetime (green line) and those who gave more than $100,000 (red line).



The lines get a bit jagged, but it looks to me like the six-figure lifetime donors pull away from the five-figure donors while still in their 40s. And notice as well that they increase their giving after age 65, which is very unusual behaviour: By 65, the vast majority of donors have either long plateaued or are starting to wind down. (You can’t see this in the chart, but that post-65 group of very generous donors numbers about 50 individuals, with yearly average giving ranging around $25,000 to $50,000.)

When I drill down, I can see about a hundred donors sitting along the red line between the ages of 30 and 45, whom we might have identified as exceptional, had we known what to look for.

With the benefit of hindsight, we are now able to look at current donors who were born more recently (after 1969, say), and identify who’s sending out early signals. I have those charts, but I think you’ve seen enough, and as I have said many times in the past: My data is not your data. So while I can propose the following “rules” for identifying an up-and-comer, I don’t recommend you try applying them to your own situation without running your own analysis:

  • Gave more than $200 in one year, starting around age 28.
  • Gave more than $250 in one year, starting around age 29.
  • Gave more than $500 in one year, starting around age 32.

Does this mean I think we can ask a 32-year-old for $10,000 this year? No. It means that this 32-year-old is someone to watch out for and to keep engaged as an alum. It’s the donors over 50 or so who have exhibited these telltale patterns in their early giving that might belong in a major gift prospect portfolio.

Precocious giving certainly isn’t the only indicator of a good prospect, but along with a few other unusual traits, it is a good start. (See: Odd but true findings? Upgrading annual donors are “erratic” and “volatile”.)


Where do you go from here? That is completely up to you. I am still in the process of figuring out how to best use these insights.

Coming up with some rules of thumb, as above, is one way to proceed. Another is rolling up all of a donor’s early giving into a single score — a Precocity Score — that takes into account both how much a donor gave, and how young she was when she gave it. I experimented with a formula that gave progressively higher weights to the number of dollars given for younger ages. For example, $100 given at age 26 might be worth several times more than $200 given at age 44.

Using my data set of donors with a full life cycle of giving, I tested whether this score was predictive of lifetime value. It certainly was. However, I also found that a simple cumulative sum of a donor’s giving up to age 35 or 40 was equally as predictive. There didn’t seem to be any additional benefit to giving extra weight to very early giving.

I am shying away from using giving history as an input in a predictive model. I see people do this all the time, but I have always avoided the practice. My preference is to use some version of the rules above as just one more tool to use in prospect identification, distinct from other efforts such as predictive modelling.


That’s as far as I have gotten. If this discussion has given you some ideas to explore, then wonderful. I doubt I’m breaking new ground here, so if you’ve already analyzed giving-by-age, I’d be interested in hearing how you’ve applied what you’ve learned.

Incidentally, Nate Silver went on to produce “similarity scores” for pairs of hitters. Using baseball’s rich trove of data, he compared players using a nearest-neighbour analysis, which took into account a wide range of data points, from player height and weight to all the game stats that baseball is famous for. A young prospect in the minor leagues with a score that indicates a high degree of similarity with a known star might be expected to “age” in a similar way. That was the theory, anyway.

One can imagine how this might translate to the fundraising arena. If you identified groups of your best donors, with a high degree of similarity among the members of each group, you could then identify younger donors with characteristics that are similar to the members of each group. After all, major gift donors are not all alike, so why not try to fit multiple “types”?

I would guess that the relatively small size of each group would cause any signal to get drowned out in the noise. I am a little skeptical that we can parse things that finely. It would, however, be an interesting project.

A final note. The PECOTA system had some successes and for a time was an improvement on existing predictive tools. Over time, however, pure statistics were not a match for the combination of quantitative methods and the experience and knowledge of talent scouts. In the same way, identifying the best prospects for fundraising relies on the combined wisdom of data analysts, researchers and fundraisers themselves.

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.


The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:


If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

16 July 2013

Alumni engagement scoring vs. predictive modelling

Filed under: Alumni, engagement, predictive modeling — Tags: , , , — kevinmacdonell @ 8:06 am

Alumni engagement scoring has an undeniable appeal. What could be simpler? Just add up how many events an alum has attended, add more points for volunteering, add more points for supporting the Annual Fund, and maybe some points for other factors that seem related to engagement, and there you have your score. If you want to get more sophisticated, you can try weighting each score input, but generally engagement scoring doesn’t involve any advanced statistics and is easily grasped.

Not so with predictive modelling, which does involve advanced stats and isn’t nearly as intuitive; often it’s not possible to really say how an input variable is related to the outcome. It’s tempting, too, to think of an engagement score as being a predictor of giving and therefore a good replacement for modelling. Actually, it should be predictive — if it isn’t, your score is not measuring the right things — but an engagement score is not the same thing as a predictive model score. They are different tools for different jobs.

Not only are engagement scoring schemes different from predictive models, their simplicity is deceptive. Engagement scoring is incomplete without some plan for acting on observed trends with targeted programming. This implies the ability to establish causal drivers of engagement, which is a tricky thing.

That’s a sequence of events — not a one-time thing. In fact, engagement scoring is like checking the temperature at regular intervals over a long period of time, looking for up and down trends not just for the group as a whole but via comparisons of important subgroups defined by age, sex, class year, college, degree program, geography or other divisions. This requires discipline: taking measurements in exactly the same way every year (or quarter, or what-have-you). If the score is fed by a survey component, you must survey constantly and consistently.

Predictive models and engagement scores have some surface similarities. They share variables in common, the output of both is a numerical score applied to every individual, and both require database work and math in order to calculate them. Beyond that, however, they are built in different ways and for different purposes. To summarize:

  • Predictive models are collections of potentially dozens of database variables weighted according to strength of correlation with a well-defined behaviour one is trying to predict (eg. making a gift), in order to rank individuals by likelihood to engage in that behaviour. Both Alumni Relations and Development can benefit from the use of predictive models.
  • Engagement scores are collections of a very few selectively-chosen database variables, either not weighted or weighted according to common sense and intuition, in order to roughly quantify the quality of “engagement”, however one wishes to define that term, for each individual. The purpose is to allow comparison of groups (faculties, age bands, geographical regions, etc.) with each other. Comparisons may be made at one point in time, but it is more useful to compare relative changes over time. The main user of scores is Alumni Relations, in order to identify segments requiring targeted programming, for example, and to assess the impact of programming on targeted segments over time.

Let’s explore key differences in more depth:

The purpose of modelling is prediction, for ranking or segmentation. The purpose of engagement scoring is comparison.

Predictive modelling scores are not usually included in reports. Used immediately in decision making, they may never be seen by more than one or two people. Engagement scores are included in reports and dashboards, and influence decision-making over a long span of time.

The target variable of a predictive model is quantifiable (eg. giving, measurable in dollars). In engagement scoring, there is no target variable, only an output – a construct called “engagement”, which itself is not directly measurable.

Potential input variables for predictive models are numerous (100+) and vary from model to model. Input variables for engagement scores are limited to a handful of easily measured attributes (giving, event attendance, volunteering) which must remain consistent over time.

Variables for predictive models are chosen primarily using statistical methods (correlation) and only secondarily using judgment and “common sense.” For example, if the presence of a business phone number is highly correlated with being a donor, it may be included in the model. For engagement scores, variables are chosen by consensus of stakeholders, primarily according to subjective standards. For example, event attendance and giving would probably be deemed by the committee to indicate engagement, and would therefore be included in the score. Advanced statistics rarely come into play. (For more thoughts on this, read How you measure alumni engagement is up to you.)

In predictive models, giving and variables related to the activity of giving are usually excluded as variables (if ‘giving’ is what we are trying to predict). Using any aspect of the target variable as an input is bad practice in predictive modelling and is carefully avoided. You wouldn’t, for example, use attendance at a donor recognition event to predict likelihood to give. In engagement scoring, though, giving history is usually a key input, as it is common sense to believe that being a donor is an indication of engagement. (It might be excluded or reported separately if the aim is to demonstrate the causal link between engagement indicators and giving.)

Modelling variables are weighted using multiple linear regression or other statistical method which calculates the relative influence of each variable while simultaneously controlling for the influence of all other variables in the model. Engagement score variables are usually weighted according to gut feel. For example, coming to campus for Homecoming seems to carry more weight than showing up for a pub night in one’s own city, therefore we give it more weight.

The quality of a predictive model is testable, first against a validation data set, and later against actual results. But there is no right or wrong way to estimate engagement, therefore the quality of scores cannot be evaluated conclusively.

The variables in a predictive model have complex relationships with each other that are difficult or impossible to explain except very generally. Usually there is no reason to explain a model in detail. The components in an engagement score, on the other hand, have plausible (although not verifiable) connections to engagement. For example, volunteering is indicative of engagement, while Name Prefix is irrelevant.

Predictive models are built for a single, time-limited purpose and then thrown away. They evolve iteratively and are ever-changing. On the other hand, once established, the method for calculating an engagement score must not change if comparisons are to be made over time. Consistency is key.

Which is all to say: alumni engagement scoring is not predictive modelling. (And neither is RFM analysis.) Only predictive modelling is predictive modelling.

Older Posts »

Create a free website or blog at