CoolData blog

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.


The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:


If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

27 June 2013

Time management for data analysts

Filed under: Best practices, Training / Professional Development — Tags: , , — kevinmacdonell @ 5:32 am

Does it seem you never have enough time to get your work done? You’ve got a long list of projects, more than a few of which are labeled Top Priority — as if multiple projects could simultaneously be “top priority” — along with your own analysis projects which too often get pushed aside. We aren’t going to create more time for ourselves, and there’s only so much we are empowered to say “no” to. So we need a different strategy.

The world does not need another blog post about how to be more productive, or a new system to fiddle with instead of doing real work. However, I’ve learned a few things about how to manage my own time and tasks (I have done my share of reading and fiddling), and perhaps some of what works for me will be helpful to analysts … and to prospect researchers, alumni magazine feature writers, or anyone else with work that requires extended periods of focused work.

First and foremost, I’ve learned that “managing time” isn’t an effective approach. Time isn’t under your control, therefore you can’t manage it. What IS under your control (somewhat) is your attention. If you can manage your attention on a single task for a few stretches of time every day, you will be far more productive. You need to identify unambiguously what it is you should be working on right now from among an array of competing priorities, and you need to be mentally OK with everything you’re not doing, so that you can focus.

My “system” is hardly revolutionary but it is an uncomplicated way to hit a few nails on the head: prioritization and project management, focus and “flow”, motivation, and accountability and activity tracking. Again, it’s not about managing your time, it’s about managing your projects first so that you can choose wisely, and then managing your attention so you can focus on that choice.

Here is an Excel template you can use to get started: Download Projects & Calendar – As promised, it’s nothing special. There are two main elements: One is a simple list of projects, with various ways to prioritize them, and the other is a drop-dead simple calendar with four periods or chunks of time per day, each focused on a single project.

Regarding the first tab: A “project” is anything that involves more than one step and is likely to take longer than 60 minutes to complete. This could include anything from a small analysis that answers a single question, to a big, hairy project that takes months. The latter is probably better chunked into a series of smaller projects, but the important thing is that simple tasks don’t belong here — put those on a to-do list. Whenever a new project emerges — someone asks a complicated question that needs an answer or has a business problem to solve — add it to the projects list, at least as a placeholder so it isn’t forgotten.

You’ll notice that some columns have colour highlighting. I’ll deal with those later. The uncoloured columns are:

Item: The name of the project. It would be helpful if this matched how the project is named elsewhere, such as your electronic or paper file folders or saved-email folders.

Description: Brief summary of what the project is supposed to accomplish, or other information of note.

Area: The unit the project is intended to benefit. (Alumni Office, Donor Relations, Development, etc.)

Requester: If applicable, the person most interested in the project’s results. For my own research tasks, I use “Self”.

Complete By: Sometimes this is a hard deadline, usually it’s wishful thinking. This field is necessary but not very useful in the short term.

Status/Next Action: The very next thing to be done on the project. Aside from the project name itself, this is THE single most important piece of information on the whole sheet. It’s so important, I’m going to discuss it in a new paragraph.

Every project MUST have a Next Action. Every next action should be as specific as possible, even if it seems trivial. Not “Start work on the Planned Giving study, ” but rather, “Find my folder of notes from the Planned Giving meeting.” Having a small and well-defined task that can be done right now is a big aid to execution. Compare that to thinking about the project as a whole — a massive, walled fortress without a gate — which just creates anxiety and paralysis. Like the proverbial journey, executing one well-defined step after another gets the job done eventually.

A certain lack of focus might be welcome at the very beginning of an analysis project, when some aimless doodling around with pencil and paper or a few abortive attempts at pulling sample data might help spark some creative ideas. With highly exploratory projects things might be fuzzy for a long time. But sooner or later if a project is going to get done it’s going to have an execution stage, which might not be as much fun as the exploratory stage. Then it’s all about focus. You will need the encouragement of a doable Next Action to pull you along. A project without a next action is just a vague idea.

When a project is first added to the list as a placeholder until more details become available, the next action may be unclear. Therefore the Next Action is getting clarity on the next action, but be specific. That means, “Email Jane about what she wants the central questions in the analysis to be,” not “Get clarity.”

(The column is also labeled “Status.” If a project is on hold, that can be indicated here.)

Every Next Action also needs a Next Action Date. This may be your own intended do-by date, an externally-set deadline, or some reasonable amount of time to wait if the task is one you’ve delegated to someone else or you have requested more information. Whatever the case, the Next Action Date is more important than the overall (and mostly fictitious) project completion date. That’s why the Next Action Date is conditionally formatted for easy reference, and the Completion Date is not. The former is specific and actionable, the latter is just a container for multiple next actions and is not itself something that can be “done”. (I will say more about conditional formatting shortly.)

When you are done with a project for the day, your last move before going on to something else is to decide on and record what the very next action will be when you return to that project. This will minimize the time you waste in switching from one task to another, and you’ll be better able to just get to work. Not having a clear reentry point for a project has often sidetracked me into procrastinating with busy-work that feels productive but isn’t.

The workbook holds a tab called Completed Projects. When you’re done with a project, you can either delete the row, or add it to this tab. The extra trouble of copying the row over might be worth it if you need to report on activity or produce a list of the last year’s accomplishments. As well, you can bet that some projects that are supposedly complete (but not under your control) will come up again like a meal of bad shellfish. It’s helpful to be able to look up the date you “completed” something, in order to find the files, emails and documentation you created at the time. (By the way, if you don’t document anything, you deserve everything bad that comes to you. Seriously.) If the project was complex, a lot of valuable time can be saved if you can effectively trace your steps and pick up from where you left off.

I mentioned that several columns are conditionally formatted to display varying colour intensities which will allow you to assess priorities at a glance. We’re all familiar with the distinction between “important” and “urgent”. At any time we will have jobs that must get done today but are not important in the long run. Important work, on the other hand, might someday change the whole game yet is rarely “urgent” today. It has a speculative nature to it and it may not be evident why it makes sense to clear the decks for it. This is one reason for trying to set aside some time for speculative, experimental projects — you just never know.

The Priority Rating column is where I try to balance the two (urgent vs. important), using a scale of 1 to 10, with 1 being the top priority. I don’t bother trying to ensure that only one project is a ‘1’, only one is a ‘2’, etc. — I rate each project in isolation based on a sense of how in-my-face I feel it has to be, and of course that changes all the time.

Other columns use similar flagging:

Urgent: The project must be worked on now. The cell turns red if the value is “Y”. Although it may seem that everything is urgent, reserve this for emergencies and hard deadlines that are looming. It’s not unusual for me to have something flagged Urgent, yet it has a very low priority rating … which tells you how important I think a lot of “urgent stuff” is.

Percent Complete: A rough estimate of how far along you think you are in a project. The closer to zero, the darker the cell is. Consult these cells on days when you feel it’s time to move the yardsticks on some neglected projects.

Next Action Date: As already mentioned, this is the intended date or deadline for the very next action to be taken to move the project forward. The earlier in time the Next Action Date is, the darker the cell.

Date Added: I’m still considering whether I need this column, so it doesn’t appear in my sample file. This is the date a project made it onto the list. Conditional formatting would highlight the oldest items, which would reveal the projects that have been languishing the longest. If a project has been on your list for six months and it’s 0% done, then it’s not a project — it’s an idea, and it belongs somewhere else rather than cluttering today’s view, which should be all about action. You could move it to an On Hold tab or an external list. Or just delete it. If it’s worth doing, it’ll come back.

Here’s a far-away look at the first tab of my projects list. At a glance you can see how your eye is drawn to project needing attention, as variously defined by priority, urgency, completeness, and proximity of the next deadline. There is no need to filter or sort rows, although you could do so if you wanted.


The other main element in this workbook is a simple calendar, actually a series of calendars. Each day contains four blocks of time, with breaks in between. You’ll notice that there are no time indications. The time blocks are intended to be roughly 90 minutes, but they can be shorter or longer, depending on how long a period of time you can actually stay focused on a task.

If you’re like me, that period is normally about five minutes, and for that reason we need a bit of gentle discipline. I tell myself that I am about to begin a “sprint” of work. I commit wholly to a single project, and I clear the deck for just that project, based on the knowledge that there is a time limit to how long I will work to the exclusion of all distractions until I can goof off with Twitter or what have you. I have made a bargain with myself: Okay, FINE, I will dive into that THING I’ve been avoiding, but don’t bother me again for a week!

The funny thing is, that project I’ve been avoiding will often begin to engage me after I’ve invested enough time. The best data analysis work happens when you are in a state of “flow,” characterized by total absorption in a challenging task that is a match for your skills. If you have to learn new techniques or skills in order to meet that challenge, the work might actually feel like it is rewarding you with an opportunity to grow a bit.

Flow requires blocks of uninterrupted time. There may not be much you can do about people popping by your work station to chat or to ask for things, but you can control your self-interruptions, which I’ve found are far more disruptive. I’m going to assume you’ve already shut off all alerts for email and instant messaging apps on your computer. I would go a step farther and shut down your email client altogether while you’re working through one of your 90-minute sprints, and silence your phone.

If shutting off email and phone is not a realistic option for you, ask yourself why. If you’re in a highly reactive mode, responding to numerous small requests, then regardless of what your job title is, you may not be an analyst. If the majority of your time is spent looking up stuff, producing lists, and updating and serving reports, then you need to consider an automation project or a better BI infrastructure that will allow you more time for creative, analytical work. Just saying.

On the other hand, I’ve always been irritated by the productivity gurus who say you should avoid checking email at the start of the work day, or limit email checking to only twice a day. This advice cannot apply to anyone working in the real world. Sure, you can lose the first hour of the day getting sucked into email, but a single urgent message from on high can shuffle priorities for the day, and you’d better be aware of it. A good morning strategy would be to first open your projects file, identify what your first time block contains, reviewing the first action to take, and getting your materials ready to work. THEN you can quickly consult your email for any disruptive missives (don’t read everything!) before shutting down your client and setting off to do what you set out to do. You don’t necessarily have to tackle your first time block as soon as you sit down; you just need to ensure that you fit two time blocks into your morning.

Other time block tips:

  • While you’re in the midst of your time block, keep a pad of paper handy (or a Notepad file open) to record any stray thoughts about unrelated things that occur to you, or any new tasks or ideas that occur to you and threaten to derail you. You may end up getting derailed, if the new thing is important or interesting enough, but if not, jotting a note can prevent you from having to fire up your email again or make a phone call, or whatever, and save the interruption for when you’ve reached a better stopping point.
  • Try to exert some control over when meetings are scheduled. For meetings that are an hour or longer, avoid scheduling them so that they knock a hole right in the centre of either the morning or the afternoon, leaving you with blocks of time before and after that are too short to allow you to really get into your project.
  • Keep it fluid, and ignore the boundaries of time blocks when you’re in “flow” and time passes without your being conscious of it. If you’re totally absorbed in a project that you’ve been dreading or avoiding previously, then by all means press on. Just remember to take a break.
  • When you come to the end of a block, take a moment to formulate the next action to take on that project before closing off.
  • If you happen to be called away on something urgent when you’re in the middle of a time block, try to record the next action as a placeholder. Task-switching is expensive, both in time and in mental energy. Always be thinking of leaving a door open, even if the next action seems obvious at the time. You will forget.

I usually fill projects into time blocks only a few days in advance. The extra two weeks are there in case I want to do more long-term planning. The more important the project, the more time blocks it gets, and the more likely I am to schedule it for the first time block in the morning. Note that this tool isn’t used to schedule your meetings — that’s a separate thing and you probably already have something for that. It would be nice if meetings and project focusing could happen in the same view, but to me they are different things.

At the end of a week, I move the tab for the current calendar to the end of the row, rename it to show the date range it represents, and replace it with next week’s calendar, renaming and copying tabs as needed to prepare for the week to come. I am not sure if saving old calendars serves a purpose — it might make more sense to total up the estimated number of hours invested in the project that week, keeping a running total by project on the first tab — but like everything this is a work in progress.

Your Excel file might be saved on a shared drive and made accessible to anyone who needs to know what you’re working on. In that case, I suggest adding a password, one that allows users to open the file for reading, but prevents them from saving any changes.

And finally … this workbook thing is just a suggestion. Use a system or tool that works for you. What I’ve outlined here is partly inspired by books such as David Allen’s “Getting Things Done: The Art of Stress-Free Productivity” (which is also a whole system that goes by the same name), and Mihaly Csikszentmihalyi’s “Flow: The Psychology of Optimal Experience,” as well as a host of blog posts and media stories about creativity and productivity the details of which I’ve long forgotten but which have influenced the way I go about doing work.

Your employer might mandate the use of a particular tool for time and/or project management; use it if you have to, or if it serves your needs. More likely than not, though, it won’t help you manage the most limited resource of all: your attention. Find your own way to marshall that resource, and your time and projects will take care of themselves.

18 April 2013

A response to ‘What do we do about Phonathon?’

I had a thoughtful response to my blog post from earlier this week (What do we do about Phonathon?) from Paul Fleming, Database Manager at Walnut Hill School for the Arts in Natick, Massachusetts, about half an hour from downtown Boston. With Paul’s permission, I will quote from his email, and then offer my comments afterword:

I just wanted to share with you some of my experiences with Phonathon. I am the database manager of a 5-person Development department at a wonderful boarding high school called the Walnut Hill School for the Arts. Since we are a very small office, I have also been able to take on the role of the organizer of our Phonathon. It’s only been natural for me to combine the two to find analysis about the worth of this event, and I’m happy to say, for our own school, this event is amazingly worthwhile.

First of all, as far as cost vs. gain, this is one of the cheapest appeals we have. Our Phonathon callers are volunteer students who are making calls either because they have a strong interest in helping their school, or they want to be fed pizza instead of dining hall food (pizza: our biggest expense). This year we called 4 nights in the fall and 4 nights in the spring. So while it is an amazing source of stress during that week, there aren’t a ton of man-hours put into this event other than that. We still mail letters to a large portion of our alumni base a few times a year. Many of these alumni are long-shots who would not give in response to a mass appeal, but our team feels that the importance of the touch point outweighs the short-term inefficiencies that are inherent in this type of outreach.

Secondly, I have taken the time to prioritize each of the people who are selected to receive phone calls. As you stated in your article, I use things like recency and frequency of gifts, as well as other factors such as event participation or whether we have other details about their personal life (job info, etc). We do call a great deal of lapsed or nondonors, but if we find ourselves spread too thin, we make sure to use our time appropriately to maximize effectiveness with the time we have. Our school has roughly 4,400 living alumni, and we graduate about 100 wonderful, talented students a year. This season we were able to attempt phone calls to about 1,200 alumni in our 4 nights of calling. The higher-priority people received up to 3 phone calls, and the lower-priority people received just 1-2.

Lastly, I was lucky enough to start working at my job in a year in which there was no Phonathon. This gave me an amazing opportunity to test the idea that our missing donors would give through other avenues if they had no other way to do so. We did a great deal of mass appeals, indirect appeals (alumni magazine and e-newsletters), and as many personalized emails and phone calls as we could handle in our 5-person team. Here are the most basic of our findings:

In FY11 (our only non-Phonathon year), 12% of our donors were repeat donors. We reached about 11% participation, our lowest ever. In FY12 (the year Phonathon returned):

  • 27% of our donors were new/recovered donors, a 14% increase from the previous year.
  • We reached 14% overall alumni participation.
  • Of the 27% of donors who were considered new/recovered, 44% gave through Phonathon.
  • The total amount of donors we had gained from FY11 to FY12 was about the same number of people who gave through the Phonathon.
  • In FY13 (still in progess, so we’ll see how this actually plays out), 35% of the previously-recovered donors who gave again gave in response to less work-intensive mass mailing appeals, showing that some of these Phonathon donors can, in fact, be converted and (hopefully) cultivated long-term.

In general, I think your article was right on point. Large universities with a for-pay, ongoing Phonathon program should take a look and see whether their efforts should be spent elsewhere. I just wanted to share with you my successes here and the ways in which our school has been able to maintain a legitimate, cost-effective way to increase our participation rate and maintain the quality of our alumni database.

Paul’s description of his program reminds me there are plenty of institutions out there who don’t have big, automated, and data-intensive calling programs gobbling up money. What really gets my attention is that Walnut Hill uses alumni affinity factors (event attendance, employment info) to prioritize calling to get the job done on a tight schedule and with a minimum of expense. This small-scale data mining effort is an example for the rest of us who have a lot of inefficiency in our programs due to a lack of focus.

The first predictive models I ever created were for a relatively small university Phonathon that was run with printed prospect cards and manual dialing — a very successful program, I might add. For those of you at smaller institutions wondering if data mining is possible only with massive databases, the answer is NO.

And finally, how wonderful it is that Walnut Hill can quantify exactly what Phonathon contributes in terms of new donors, and new donors who convert to mail-responsive renewals.


20 September 2012

When less data is more, in predictive modelling

When I started doing predictive modelling, I was keenly interested in picking the best and coolest predictor variables. As my understanding deepened, I turned my attention to how to define the dependent variable in order to really get at what I was trying to predict. More recently, however, I’ve been thinking about refining or limiting the population of constituents to be scored, and how that can help the model.

What difference does it make who gets a propensity score? Up until maybe a year ago, I wasn’t too concerned. Sure, probably no 22-year-old graduate had ever entered a planned giving agreement, but I didn’t see any harm in applying a score to all our alumni, even our youngest.

Lately, I’m not so sure. Using the example of a planned gift propensity model, the problem is this: Young alumni don’t just get a score; they also influence how the model is trained. If all your current expectancies were at least 50 before they decided to make a bequest, and half your alumni are under 30 years old, then one of the major distinctions your model will make is based on age. ANY alum over 50 is going to score well, regardless of whether he or she has any affinity to the institution, simply because 100% of your target is in that age group.

The model is doing the right thing by giving higher scores to older alumni. If ages in the sample range from 21 to 100+, then age as a variable will undoubtedly contribute to a large chunk of the model’s ability to “explain” the target. But this hardly tells us anything we didn’t already know. We KNOW that alumni don’t make bequest arrangements at age 22, so why include them in the model?

It’s not just the fact that their having a score is irrelevant. I’m concerned about allowing good predictor variables to interact with ‘Age’ in a way that compromises their effectiveness. Variables are being moderated by ‘Age’, without the benefit of improving the model in a way that we get what we want out of it.

Note that we don’t have to explicitly enter ‘Age’ as a variable in the model for young alumni to influence the outcome in undesirable ways. Here’s an example, using event attendance as a predictor:

Let’s say a lot of very young alumni and some very elderly constituents attend their class reunions. The older alumni who attend reunions are probably more likely than their non-attending classmates to enter into planned giving agreements — for my institution, that is definitely the case. On the other hand, the young alumni who attend reunions are probably no more or less likely than their non-attending peers to consider planned giving — no one that age is a serious prospect. What happens to ‘event attendance’ as a predictor in which the dependent variable is ‘Current planned giving expectancy’? … Because a lot of young alumni who are not members of the target variable attended events, the attribute of being an event attendee will be associated with NOT being a planned giving expectancy. Or at the very least, it will considerably dilute the positive association between predictor and target found among older alumni.

I confirmed this recently using some partly made-up data. The data file started out as real alumni data and included age, a flag for who is a current expectancy, and a flag for ‘event attendee’. I massaged it a bit by artificially bumping up the number of alumni under the age of 50 who were coded as having attended an event, to create a scenario in which an institution’s events are equally popular with young and old alike. In a simple regression model with the entire alumni file included in the sample, ‘event attendance’ was weakly associated with being a planned giving expectancy. When I limited the sample to alumni 50 years of age and older, however, the R squared statistic doubled. (That is, event attendance was about twice as effective at explaining the target.) Conversely, when I limited the sample to under-50s, R squared was nearly zero.

True, I had to tamper with the data in order to get this result. But even had I not, there would still have been many under-50 event attendees, and their presence in the file would still have reduced the observed correlation between event attendance and planned giving propensity, to no useful end.

You probably already know that it’s best not to lump deceased constituents in with living ones, or non-alumni along with alumni, or corporations and foundations along with persons. They are completely distinct entities. But depending on what you’re trying to predict, your population can fruitfully be split along other, more subtle distinctions. Here are a few:

  • For donor acquisition models, in which the target value is “newly-acquired donor”, exclude all renewed donors. You strictly want to have only newly-acquired donors and never-donors in your model. Your good prospects for conversion are the never-donors who most resemble the newly-acquired donors. Renewed donors don’t serve any purpose in such a model and will muddy the waters considerably.
  • Conversely, remove never-donors from models that predict major giving and leadership-level annual giving. Those higher-level donors tend not to emerge out of thin air: They have giving histories.
  • Looking at ‘Age’ again … making distinctions based on age applies to major-gift propensity models just as it does to planned giving propensity: Very young people do not make large gifts. Look at your data to find out at what age donors were when they first gave $1,000, say. This will help inform what your cutoff should be.
  • When building models specifically for Phonathon, whether donor-acquisition or contact likelihood, remove constituents who are coded Do Not Call or who do not have a valid phone number in the database, or who are unlikely to be called (international alumni, perhaps).
  • Exclude international alumni from event attendance or volunteering likelihood models, if you never offer involvement opportunities outside your own country or continent.

Those are just examples. As for general principles, I think both of the following conditions must be met in order for you to gain from excluding a group of constituents from your model. By a “group” I mean any collection of individuals who share a certain trait. Choose to exclude IF:

  1. Nearly 100% of constituents with the trait fall outside the target behaviour (that is, the behaviour you are trying to predict); AND,
  2. Having a score for people with that trait is irrelevant (that is, their scores will not result in any action being taken with them, even if a score is very low or very high).

You would apply the “rules” like this … You’re building a model to predict who is most likely to answer the phone, for use by Phonathon, and you’re wondering what to do with a bunch of alumni who are coded Do Not Call. Well, it stands to reason that 1) people with this trait will have little or no phone contact history in the database (the target behaviour), and 2) people with this trait won’t be called, even if they have a very high contact-likelihood score. The verdict is “exclude.”

It’s not often you’ll hear me say that less (data) is more. Fewer cases in your data file will in fact tend to depress your model’s R squared. But your ultimate goal is not to maximize R squared — it’s to produce a model that does what you want. Fitting the data is a good thing, but only when you have the right data.

6 June 2012

How you measure alumni engagement is up to you

Filed under: Alumni, Best practices, Vendors — Tags: , , , — kevinmacdonell @ 8:02 am

There’s been some back-and-forth on one of the listservs about the “correct” way to measure and score alumni engagement. An emphasis on scientific rigor is being pressed for by one vendor who claims to specialize in rigor. The emphasis is misplaced.

No doubt there are sophisticated ways of measuring engagement that I know nothing about, but the question I can’t get beyond is, how do you define “engagement”? How do you make it measurable so that one method applies everywhere? I think that’s a challenging proposition, one that limits any claim to “correctness” of method. This is the main reason that I avoid writing about measuring engagement — it sounds analytical, but inevitably it rests on some messy, intuitive assumptions.

The closest I’ve ever seen anyone come is Engagement Analysis Inc., a firm based here in Canada. They have a carefully chosen set of engagement-related survey questions which are held constant from school to school. The questions are grouped in various categories or “drivers” of engagement according to how closely related (statistically) the responses tend to be to each other. Although I have issues with alumni surveys and the dangers involved in interpreting the results, I found EA’s approach fascinating in terms of gathering and comparing data on alumni attitudes.

(Disclaimer: My former employer was once a client of this firm’s but I have no other association with them. Other vendors do similar and very fine work, of course. I can think of a few, but haven’t actually worked with them, so I will not offer an opinion.)

Some vendors may make claims of being scientific or analytically correct, but the only requirement of quantifying engagement is that it be reasonable, and (if you are benchmarking against other schools) consistent from school to school. In general, if you want to benchmark, then engage a vendor if you want to do it right, because it’s not easily done.

But if you want to benchmark against yourself (that is, over time), don’t be intimidated by anyone telling you your method isn’t good enough. Just do your own thing. Survey if you like, but call first upon the real, measurable activities that your alumni participate in. There is no single right way, so find out what others have done. One institution will give more weight to reunion attendance than to showing up for a pub night, while another will weigh all event attendance equally. Another will ditch event attendance altogether in favour of volunteer activity, or some other indicator.

Can anyone say definitively that any of these approaches are wrong? I don’t think so — they may be just right for the school doing the measuring. Many schools (mine included) assign fairly arbitrary weights to engagement indicators based on intuition and experience. I can’t find fault with that, simply because “engagement” is not a quantity. It’s not directly measurable, so we have to use proxies which ARE measurable. Other schools measure the degree of association (correlation) between certain activities and alumni giving, and base their weights on that, which is smart. But it’s all the same to me in the end, because ‘giving’ is just another proxy for the freely interpretable quality of “engagement.”

Think of devising a “love score” to rank people’s marriages in terms of the strength of the pair bond. A hundred analysts would head off in a hundred different directions at Step 1: Defining “love”. That doesn’t mean the exercise is useless or uninteresting, it just means that certain claims have to be taken with a grain of salt.

We all have plenty of leeway to chose the proxies that work for us, and I’ve seen a number of good examples from various schools. I can’t say one is better than another. If you do a good job measuring the proxies from one year to the next, you should be able to learn something from the relative rises and falls in engagement scores over time and compared between different groups of alumni.

Are there more rigorous approaches? Yes, probably. Should that stop you from doing your own thing? Never!

26 January 2012

More mistakes I’ve made

Filed under: Best practices, Peter Wylie, Pitfalls, Validation — Tags: , , , — kevinmacdonell @ 1:38 pm

A while back I wrote a couple of posts about mistakes I’ve made in data mining and predictive modelling. (See Four mistakes I have made and When your predictive model sucks.) Today I’m pleased to point out a brand new one.

The last days of work leading up to Christmas had me evaluating my new-donor acquisition models to see how well they’ve been working. Unfortunately, they were not working well. I had hoped — I had expected — to see newly-acquired donors clustered in the upper ranges of the decile scores I had created. Instead they were scattered all along the whole range. A solicitation conducted at random would have performed nearly as well.

Our mailing was restricted by score (roughly the top two deciles only), but our phone solicitation was more broad, so donors came from the whole range of deciles:

Very disappointing. To tell the truth, I had seen this before: A model that does well predicting overall participation, but which fails to identify which non-donors are most likely to convert. I am well past the point of being impressed by a model that tells me what everyone already knows, i.e. that loyal donors are most likely to give again. I want to have confidence that acquisition mail dollars are spent wisely.

So it was back to the drawing board. I considered whether my model was suffering from overfit, whether perhaps I had too many variables, too much random noise, multicolinearity. I studied and rejected one possibility after another. After so much effort, I came rather close to concluding that new-donor acquisition is not just difficult — it might be darn near impossible.

Dire possibility indeed. If you can’t predict conversion, then why bother with any of this?

It was during a phone conversation with Peter Wylie that things suddenly became clear. He asked me one question: How did I define my dependent variable? I checked, and found that my DV was named “Recent Donors.” That’s all it took to find where I had gone wrong.

As the name of the DV suggested, it turned out that the model was trained on a binary variable that flagged anyone who had made a gift in the past two years. The problem was that included everybody: long-time donors and newly-acquired donors alike. The model was highly influenced by the regular donors, and the new donors were lost in the shuffle.

It was a classic case of failing to properly define the question. If my goal was to identify the patterns and characteristics of newly-acquired donors, then I should have limited my DV strictly to non-donors who had recently converted to donors!

So I rebuilt the model, using the same data file and variables I had used to build the original model. This time, however, I pared the sample down to alumni who had never given a cent before fiscal 2009. They were the only alumni I needed to have scores for. Then I redefined my dependent variable so that non-donors who converted, i.e., who made a gift in either fiscal 2009 or 2010, were coded ‘1’, and all others were coded ‘0’. (I used two years of giving data instead of just one in order to have a little more data available for defining the DV.) Finally, I output a new set of decile scores from a binary logistic regression.

A test of the new scores showed that the new model was a vast improvement over the original. How did I test this? Recall that I reused the same data file from the original model. Therefore, it contained no giving data from the current fiscal year; the model was innocent of any knowledge of the future. Compare this breakdown of new donors with the one above:

Much better. Not fan-flippin-tastic, but better.

My error was a basic one — I’ve even cautioned about it in previous posts. Maybe I’m stupid, or maybe I’m just human. But like anyone who works with data, I can figure out when I’m wrong. That’s a huge advantage.

  • Be skeptical about the quality of your work.
  • Evaluate the results of your decisions.
  • Admit your mistakes.
  • Document your mistakes and learn from them.
  • Stay humble.
Older Posts »

The Silver is the New Black Theme. Blog at


Get every new post delivered to your Inbox.

Join 1,050 other followers