CoolData blog

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.

map_alums

The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:

manhattan

If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

22 June 2010

Making “Email present” predictive again

Filed under: Alumni, Predictor variables — Tags: , , , — kevinmacdonell @ 5:27 am

Gmail, Yahoo, Hotmail - their presence as an alum's preferred email address is a negative predictor of giving. (Creative Commons license. Click image for source.)

“Email present” (0/1) seems to be breathing its last as a good predictor of giving. Even if you do find some positive correlation between having an address and giving, it probably pales in comparison to other contact-related variables such as the presence of home phone or business phone.

Is there anything we can do to save it? I think so.

The key is in how people use email. I don’t have hard data on this, but it’s my impression that most people have two addresses, and many have multiple addresses. One might be their work email. Another might be their personal “home” address, shared with other members of the household. And frequently there will be a third address, also personal, but more “public” than the home email and reserved for messages that the recipient considers relatively unimportant.

If it seems easier than ever to collect email addresses, that’s because it is. Only instead of getting a work address, you’re more likely to get a personal address, and it will probably be of the third kind: A spam account, which the recipient may or may not be checking. This is the account that a person will use to sign up for things online, or to enter contests, or to subscribe to various things — anytime one expects to receive followup or advertising messages that would be unwelcome in a workplace account. These are typically Gmail or Hotmail accounts; because they have practically unlimited quota, you’ll never have a bounce-back due to a full inbox. Your database will steadily accumulate a trove of useless contact information.

Like alumni with a business phone in your database, alumni who share their business or work email address are keen to hear from you. This is probably less true for alumni who would rather receive mail at their “home” address. And it’s least true for alumni who shunt your messages to a low-priority, free account.

But unlike phone numbers, email addresses don’t have a code in your database to indicate the context (business, home, seasonal, mobile). It may be impossible to tell. However, it’s not hard to screen out the low-value accounts. After all, the field is dominated by a handful of likely suspects, two of which I’ve already named.

Step one is to pull all the valid email addresses from your database, plus a column for Lifetime Giving. Paste this data into an Excel spreadsheet. Insert a new column to the right of the Email Address column, and give it a label called “Domain.” In this column, you’re going to capture everything to the right of the @ symbol, i.e. the domain name. The formula will look something like this:

=RIGHT(A2,LEN(A2)-FIND(“@”,A2,1))

Copy the formula for the whole length of your spreadsheet, and inspect the results to ensure you’ve isolated just the domain names. The rest of your analysis is easiest to do in stats software. Copy your columns, including Lifetime Giving, into the stats package and create a sorted frequency table to see which domains are most common in your database.

You’ll probably discover that fewer than half a dozen companies account for three-quarters of addresses. Do a little recoding of the variable to gather together variations on the same domain — yahoo.ca and yahoo.com, differences in capitalization, and so on. Recode all remaining addresses as “Other”, and missing addresses as “None”. Then check how each domain category compares as far as giving is concerned.

The following are the seven most common email domains in our database, plus ‘None’ and ‘Other’, sorted in descending order by average lifetime giving. (Averages include non-donors.)

Notice how those familiar generic email domains sit right at the bottom; alumni with no email at all have average giving that is five times that of Hotmail account holders! At the more generous end of the scale are domains which are popular for home email accounts in our city, hinting at a geographic influence, but also pointing to their superiority over the generics.

But it is the Other category that I think is most interesting. Once we’ve screened out a large portion of the generic and home email addresses, we’re left with a segment that is a much richer vein for business and employment-related addresses. These are most likely to be active accounts, probably with strict quotas, that get checked every day by people who actually read our messages.

Have a look at this. Here are Pearson’s r values for the strength of correlation between three different email-related variables and Lifetime Giving (log-transformed). ‘Top email domains‘ (0/1) consists of Other and a couple of the more generous domains above. ‘Bottom email domains‘ (0/1) consists of Yahoo, Gmail and Hotmail addresses. ‘Email address present‘ (0/1) is just that: Is there or is there not an email (any email) present.

It used to be that I would go with ‘Email address present’ as my predictor variable, but look how it pales in comparison with the other two! In place of a very weak predictor variable we now have one strong positive predictor and one strong negative predictor.

It’s always hard to say what will become of a correlation when everything is put together in an actual model, but so far I’m finding that both of the new variables are holding up very well in multiple regression, independently maintaining very low p-values. Email has been rescued from oblivion!

P.S.: Just became aware of a site called tempalias.This is a free service that provides temporary, throwaway email addresses, for people who want to sign up for an online service or community without providing a real email address. The user can set a maximum number of days or messages for which the tempalias will be valid, after which it is automatically deleted. Mail is forwarded to a person’s real address for as long as he or she figures it needs it to be, and no longer. Do you have any of these addresses in your database? If so, they’re serving pretty much the same purpose as a lot of the Yahoo, Gmail, and Hotmail addresses you also have!

P.P.S.: Just this past December (2010) I watched a presentation given by a vendor, a predictive analytics software company, which broke down ’email’ in exactly this way as an example of an analysis of a predictor variable. Yahoo, Hotmail and Gmail addresses were found to be negatively predictive in the data set they were using. After this independent example, I would not be surprised if this held true for many other data sets as well.

Blog at WordPress.com.