There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.
I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.
But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.
All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.
Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.
Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.
The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:
Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.
There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.
That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.
However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.
And now for the twist …
Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.
That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.
(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)
What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.
In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.
I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.
The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.
The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.
These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:
If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.
(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)
Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.
This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.
MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.
Alumni engagement scoring has an undeniable appeal. What could be simpler? Just add up how many events an alum has attended, add more points for volunteering, add more points for supporting the Annual Fund, and maybe some points for other factors that seem related to engagement, and there you have your score. If you want to get more sophisticated, you can try weighting each score input, but generally engagement scoring doesn’t involve any advanced statistics and is easily grasped.
Not so with predictive modelling, which does involve advanced stats and isn’t nearly as intuitive; often it’s not possible to really say how an input variable is related to the outcome. It’s tempting, too, to think of an engagement score as being a predictor of giving and therefore a good replacement for modelling. Actually, it should be predictive — if it isn’t, your score is not measuring the right things — but an engagement score is not the same thing as a predictive model score. They are different tools for different jobs.
Not only are engagement scoring schemes different from predictive models, their simplicity is deceptive. Engagement scoring is incomplete without some plan for acting on observed trends with targeted programming. This implies the ability to establish causal drivers of engagement, which is a tricky thing.
That’s a sequence of events — not a one-time thing. In fact, engagement scoring is like checking the temperature at regular intervals over a long period of time, looking for up and down trends not just for the group as a whole but via comparisons of important subgroups defined by age, sex, class year, college, degree program, geography or other divisions. This requires discipline: taking measurements in exactly the same way every year (or quarter, or what-have-you). If the score is fed by a survey component, you must survey constantly and consistently.
Predictive models and engagement scores have some surface similarities. They share variables in common, the output of both is a numerical score applied to every individual, and both require database work and math in order to calculate them. Beyond that, however, they are built in different ways and for different purposes. To summarize:
Let’s explore key differences in more depth:
The purpose of modelling is prediction, for ranking or segmentation. The purpose of engagement scoring is comparison.
Predictive modelling scores are not usually included in reports. Used immediately in decision making, they may never be seen by more than one or two people. Engagement scores are included in reports and dashboards, and influence decision-making over a long span of time.
The target variable of a predictive model is quantifiable (eg. giving, measurable in dollars). In engagement scoring, there is no target variable, only an output – a construct called “engagement”, which itself is not directly measurable.
Potential input variables for predictive models are numerous (100+) and vary from model to model. Input variables for engagement scores are limited to a handful of easily measured attributes (giving, event attendance, volunteering) which must remain consistent over time.
Variables for predictive models are chosen primarily using statistical methods (correlation) and only secondarily using judgment and “common sense.” For example, if the presence of a business phone number is highly correlated with being a donor, it may be included in the model. For engagement scores, variables are chosen by consensus of stakeholders, primarily according to subjective standards. For example, event attendance and giving would probably be deemed by the committee to indicate engagement, and would therefore be included in the score. Advanced statistics rarely come into play. (For more thoughts on this, read How you measure alumni engagement is up to you.)
In predictive models, giving and variables related to the activity of giving are usually excluded as variables (if ‘giving’ is what we are trying to predict). Using any aspect of the target variable as an input is bad practice in predictive modelling and is carefully avoided. You wouldn’t, for example, use attendance at a donor recognition event to predict likelihood to give. In engagement scoring, though, giving history is usually a key input, as it is common sense to believe that being a donor is an indication of engagement. (It might be excluded or reported separately if the aim is to demonstrate the causal link between engagement indicators and giving.)
Modelling variables are weighted using multiple linear regression or other statistical method which calculates the relative influence of each variable while simultaneously controlling for the influence of all other variables in the model. Engagement score variables are usually weighted according to gut feel. For example, coming to campus for Homecoming seems to carry more weight than showing up for a pub night in one’s own city, therefore we give it more weight.
The quality of a predictive model is testable, first against a validation data set, and later against actual results. But there is no right or wrong way to estimate engagement, therefore the quality of scores cannot be evaluated conclusively.
The variables in a predictive model have complex relationships with each other that are difficult or impossible to explain except very generally. Usually there is no reason to explain a model in detail. The components in an engagement score, on the other hand, have plausible (although not verifiable) connections to engagement. For example, volunteering is indicative of engagement, while Name Prefix is irrelevant.
Predictive models are built for a single, time-limited purpose and then thrown away. They evolve iteratively and are ever-changing. On the other hand, once established, the method for calculating an engagement score must not change if comparisons are to be made over time. Consistency is key.
Which is all to say: alumni engagement scoring is not predictive modelling. (And neither is RFM analysis.) Only predictive modelling is predictive modelling.