CoolData blog

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.

map_alums

The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:

manhattan

If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

Advertisement

4 November 2013

Census Zip Code data versus internal data as predictors of alumni giving

Guest post by Peter Wylie and John Sammis

Thanks to data available via the 2010 US Census, for any educational institution that provides us zip codes for the alums in its advancement database, we can compute such things as the median income and the median house value of the zip code in which the alum lives.

Now, we tend to focus on internal data rather than external data. For a very long time the two of us have been harping on something that may be getting a bit tiresome: the overemphasis on finding outside wealth data in major giving, and the underemphasis on looking at internal data. Our problem has been that we’ve never had a solid way to systematically compare these two sources of data as they relate to the prediction of giving in higher education.

John Sammis has done a yeoman’s job of finding a very reasonably priced source for this Census data as well as building some add-ons to our statistical software package – add-ons that allow us to manipulate the data in interesting ways. All this has happened within the last six months or so, and I’ve been having a ball playing around with this data, getting John’s opinions on what I’ve done, and then playing with the data some more.

The data for this piece come from four private, small to medium sized higher education institutions in the eastern half of the United States. We’ll show you a smidgeon of some of the things we’ve uncovered. We hope you’ll find it interesting, and we hope you’ll decide to do some playing of your own.

Download the full, printer-friendly PDF of our study here (free, no registration required): Census ZIP data Wylie & Sammis.

4 January 2012

Look inside first

Filed under: External data — Tags: — kevinmacdonell @ 8:41 am

During a panel discussion on the second day of last October’s DRIVE conference, one of the panel members mentioned that it’s possible to learn which of your constituents have “liked” your fan page on FaceBook. The mechanics of it went over my head — I don’t recall if it involved developing an application or scraping data some other way. Anyway, the discussion veered in the direction that doing this was crossing some sort of line.

I’m not sure about that. On one hand, you need to remember your responsibility to donors to operate in the least wasteful way possible. Getting smarter about identifying people who feel an affinity with your institution or cause is part of that.

On the other hand, it’s hard to justify making it a priority to scrape data from external sources if you’re doing a lousy job of using your much more valuable internal data.

Let’s get the internal part right first, and explore the ethics of casting a wider net later.

23 September 2011

Who needs analytics vendors?

Filed under: External data, Vendors — Tags: , — kevinmacdonell @ 6:08 am

I’ve written a guest post for Andrew Urban’s blog, Return on Mission. Andrew is the author of a great little book called “The Nonprofit Buyer,” which is subtitled: “Strategies for Success from a Nonprofit Technology Sales Veteran.” It’s all about helping nonprofits make better choices when it comes to dealing with vendors of technology products and services. You can find out more on Andrew’s blog.

I’m pleased he’s asked me to contribute to Return on Mission, where I write on a topic I haven’t addressed on my own blog. Readers of CoolData know that my focus is the in-house analytics capability of nonprofits and higher-education institutions.  So what do I think about analytics and analytic services purchased from vendors?

Well, if you want to find out, you’ll have to follow the link: Knowledgeable Purchasers — 4 Easy Rules

9 July 2010

How to infer age, when all you have is a name

Filed under: Coolness, External data, Non-university settings, Predictor variables — kevinmacdonell @ 6:02 am

I rarely post on a Friday, let alone a Friday in the middle of summer, but today’s cool idea is somewhat half-baked. Its very flakiness suits the day and the weather. Actually, I think it has potential, but I’m interested to know what others think.

For those of us in higher-ed fundraising, ‘age’ or ‘class year’ is a key predictor variable. Not everyone has this information in their databases, however. What if you could sort of impute a “best guess” age, based on a piece of data that you do have: First name?

Names go in and out of fashion. You may have played around with this cool tool for visualizing baby-name trends. My own first name, Kevin, peaked in popularity in the 1970s and has been on a downward slide ever since (chart here). I was born in 1969, so that’s pretty close. My father’s name, Leo, has not been popular since the 1920s (he was born in 1930), but is having a slight comeback in recent years (chart here).

As for female names, my mother’s name, Yvonne, never ranked in the top 1,000 in any time period covered by this visualization tool, so I’ll use my niece’s name: Katelyn. She was born in 2005. This chart shows that two common spellings of her name peaked around that year. (The axis labeling is a bit wonky — you’ll have to hover your cursor over the display to get a good read on the timing of the peak.)

You can’t look up every first name one by one, obviously, so you’ll need a data set from another source that relates relative frequencies of names with age data. That sort of thing might be available in census data. But knowing somebody with access to a higher-ed database might be the easiest way.

I’ve performed a query on our database, pulling on just three fields: ID (to ensure I have unique records), First Name, and Age — for more than 87,000 alumni. (Other databases will have only Class Year — we’re fortunate in that we’ve got birth dates for nearly every living alum.) Here are a few sample rows, with ID number blanked out:

From here, it’s a pretty simple matter to copy the data into stats software (or Excel) to compute counts and median ages for each first name. Amazingly, just six first names account for 10% of all living, contactable alumni! (In order: John, David, Michael, Robert, James, and Jennifer.)

On the other hand, a lot of first names are unique in the database, or nearly so. To simplify things a bit, I calculated median ages only for names represented five or more times in the database. These 1,380 first names capture the vast majority of alumni.

The ten “oldest” names in the database are listed in the chart below, in descending order by median age. Have a look at these venerable handles. Of these, only Max has staged a rebound in recent years (according to the Baby Names visualizer).

And here are the ten “youngest names,” in ascending order by median age. It’s an interesting coincidence that the very youngest name is Katelyn — my five-year-old niece. One or two (such as Jake) were popular many years ago, and at least one has flipped gender from male to female (Whitney). Most of the others are new on the scene.

The real test is, do these median ages actually provide reasonable estimates of age for people who aren’t in the database?

I’m not in the database (as an alum). There are 371 Kevins in the database, and their median age is 43. I turned 41 in May, so that’s very good.

My father is also not an alum. The 26 Leos in the database have a median age of 50, which is unfortunately 30 years too young. Let’s call that one a ‘miss’.

My mother’s ‘predicted’ age is off by half that — 15 years — that’s not too bad.

Here’s how my three siblings’ names fare: Angela (predicted 36, actual 39 — very good), Paul (predicted 48, actual 38 — fair), and Francis (predicted 60, actual 36 — poor). Clearly there’s an issue with Francis, which according to the Baby Names chart tool was popular many decades ago but not when my brother was named. In other words, results for individuals may vary!

So let’s say you’re a non-profit without access to age data for your database constituents. How does this help you? Well it doesn’t — not directly. You will need to find a data partner at a university who will prepare a file for you, just as I’ve done above. When you import the data into your model, you can match up records by first name and voila, you’ve got a variable that gives you a rough estimate of age. (Sometimes very rough — but it’s better than nothing.)

This is only an idea of mine. I don’t know if anyone has actually done this, so I’d be interested to hear from others. Here are a few additional thoughts:

  • There shouldn’t be any privacy concern — all you want is a list of first names and their median ages, NOT IDs or last names — but be sure to get all necessary approvals.
  • To anticipate your question, no, I won’t provide you my own file. I think you’d be much better off getting a names file from a university in your own city or region, which will provide a more accurate reflection of the ethnic flavour of your constituency.
  • I used “First name”, but of course universities collect first, middle and last names, and the formal first name might not be the preferred moniker. If the university database has a “Preferred first name” field that is fully populated, that might be a better option for matching with your first-name field.
  • Again, there might be more accessible sources of name- or age-related data out there. This idea just sounded fun to try!

25 May 2010

Is the Do Not Call List bogus?

Filed under: Annual Giving, External data, Phonathon, Predictor variables — Tags: — kevinmacdonell @ 9:41 am

Logo of Canada's Do Not Call registry service.

Last week I told you how I obtained a list of phone numbers from Canada’s Do Not Call List (two million phone numbers!). I matched these up with phone numbers from an alumni database in order to create a potential new predictor variable for my models. Today I reveal my rather unexpected findings.

To recap: In 2008, Canada introduced the National Do Not Call List (DNCL), which gives consumers a choice about whether to receive telemarketing calls. Anyone can add their phone numbers to the list, and telemarketing companies are forced to avoid calling those numbers. Canadian registered charities, including universities soliciting donations via calling programs, are exempt from the DNC list. However, any organization may access the list — which we did, for the purpose of research. Similar registries exist in the U.S. and around the world.

The results of my little experiment looked odd right from the beginning. When I matched up phone numbers, I discovered that a whopping 42% of living alumni with a home phone number in the area codes of interest had in fact signed up for the Do Not Call List. That seemed awfully high to me — but, oh well, I certainly didn’t lack for comparative data. Any differences between the DNC group and all other alumni were bound to be significant.

Or not! Check out these findings:

  • The two groups (DNC / not DNC) hardly differed in their age distribution. The very oldest and the very youngest alumni registered at the lowest rate (37.6% and 38.9%), but participation in the List was nearly equal across all age levels.
  • Alumni who signed up for the DNC list were slightly more likely to be donors. (Counter-intuitive, I thought.)
  • When I narrowed the definition of ‘giving’ to gifts received recently via the calling program, I found no difference in giving between the DNC and the non-DNC group.  I had expected that people who object to being called by telemarketers would also give less in response to a call from alma mater, and I was very surprised with this result. Average pledge and rate of participation were almost exactly equivalent across both groups.
  • The number of alumni who were coded ‘do not solicit by phone’ were about equal for both groups, DNC and non-DNC.
  • The number of alumni who asked not to be solicited by affinity partners (credit card, insurance, etc.) was also about equal for both groups.

The problem was not that the results were unexpected; unexpected is almost always interesting. No, the problem was that the results were impossible to interpret. The intersection of the DNC list with the alumni database was distinguished by an almost total lack of pattern or tendency. There were three possible conclusions to draw from this, one of which must be correct:

  1. The two data sets were completely unrelated due to some undiagnosed error in the analysis.
  2. The two data sets were related, but alumni draw a complete distinction between telemarketers and our student callers. They want off the calling lists of marketers, but this has nothing to do with their attitude toward alma mater and its fundraising efforts. If true, this would be good news indeed. But somehow I doubt it!
  3. The DNC list is a random data set. The near-total lack of distinguishing features strongly suggests that the DNC list is just a random sampling of the Canadian population. In other words, the list has been diluted by the mass uploading of phone numbers, despite security measures in place to prevent that from happening. If numbers are being added to the list without householders’ knowledge, the data do not represent people’s attitudes and intentions and are therefore worthless for the purpose of analysis.

Regardless of what the answer is, one thing is certain: We must never allow the DNC list to be applied to charities and nonprofits without a fight. This (possibly bogus) list will cut indiscriminately across a broad cross-section of anyone’s donor base, and a ban on calling would seriously harm any phone-based fundraising effort. Fortunately there does not seem to be any intention to extend the reach of the DNC list at present.

Getting back to the matter of finding new predictors: Every once in a while I get it in my head that the potential in our database is tapped out as far as new predictors goes. There HAVE to be other sources of data on our constituents that will provide amazing new insights into their behaviour. Sometimes going outside the database is worthwhile (survey data, for example) and sometimes  it just isn’t.

The lesson might be: Unless the data you covet relates directly to your constituents’ relationship with (or attitude towards) your institution, it may not be worth a great deal of time or money to acquire it.

Postscript: I’ve just had an opportunity to run the same lists of phone numbers against another and much larger university database. Once again, the binary variable “On the Do Not Call List” behaved like a randomly-generated number. I found that almost a third of the alumni population with phone numbers in the database is supposedly on this list, but the tiny fraction of a difference in giving behaviours between the DNC and not-DNC groups were not statistically significant.

Older Posts »

Create a free website or blog at WordPress.com.