It was only a matter of time. Over the weekend, a longtime friend dumped a bucket of ice water over his head and posted the video to Facebook. He challenged three friends — me included — to take the Ice Bucket Challenge in support of ALS research. I passed on the cold shower, but this morning I did make a gift to ALS Canada, a cause I wouldn’t have supported had it not been for my friend Paul and the brilliant campaign he participated in.*
Universities and other charities are, of course, watching closely and asking themselves how they can replicate this phenomenon. Fine … I am skeptical that central planning and a modest budget can give birth to such a massive juggernaut of socially-responsible contagion … but I wish them luck.
While we can admire our colleagues’ amazing work and good fortune, I am not sure we should envy them. In the coming year, ALS charities will be facing a huge donor-retention issue. Imagine gaining between 1.5 and 2 million new donors in the span of a few months. Now, I have no knowledge of what ALS fundraisers really intend to do with their hordes of newly-acquired donors. Maybe retention is not a goal. But it is a sure thing that the world will move on to some other craze. Retaining a tiny fraction of these donors could make the difference between the ice bucket challenge being just a one-time, non-repeatable anomaly and turning it into a foundation for long-term support that permanently changes the game for ALS research.
Perhaps the ice bucket challenge can be turned into an annual event that becomes as established as the walks, runs and other participatory events that other medical-research charities have. Who knows.
For certain is that the majority of new donors will not give again. Also for certain is that it would be irresponsibly wasteful for charities to spread their retention budget equally over all new donors.
Which brings me to predictive modeling. Some portion of new donors WILL give again. Maybe something about the challenge touched them more deeply than the temporary fun of the ice bucket dare. Maybe they learned something about the disease. Maybe they know someone affected by ALS. There is no direct way to know. But I would be willing to bet that higher levels of engagement can be found in patterns in the data.
What factors might be predictors of longer-term engagement? It is not possible to say without some analysis, but sources of information might include:
Shreds of ambiguous clues scattered here and there, admittedly, but that is what a good predictive model detects and amplifies. If it were up to me, I would also have asked on the giving page whether the donor had done the ice bucket thing. A year from now, my friend Paul is going to clearly remember the shock of pouring ice water over his head, plus the positive response he got on Facebook, and this will bring to mind his gift and the need to give again. My choosing not to do so might be associated with a lower level of commitment, and thus a lower likelihood of renewing. Just a theory.**
Data-informed segmentation aimed at getting a second gift from newly-acquired donors is not quite as sexy as being an internet meme. However, unlike riding the uncontrollable wave of a social media sensation, retention is something that charities might actually be able to plan for.
* I would like to see this phenomenon raise all boats for medical charities, therefore I also gave to Doctors Without Borders Canada and the Molly Appeal for Medical Research. Check them out.
** Update: I am told that actually, this question IS asked. I didn’t see it on the Canadian site, but maybe I just missed it. Great!
POSTSCRIPT
I was quoted on this topic in a story in the September 4th online edition of the Chronicle of Philanthropy. Link (subscribers only): After Windfall, ALS Group Grapples With 2.4-Million Donor Dilemma
No, this is not the last time I’ll write about Phonathon, but after today I promise to give it a rest and talk about something else. I just wanted to round out my post on the waste I see happening in donor acquisition via phone programs with some recent findings of mine. Your mileage may vary, or “YMMV” as they say on the listservs, so as usual don’t just accept what I say. I suggest questions that you might ask of your own data — nothing more.
I’ve been doing a thorough analysis of our acquisition efforts this past year. (The technical term for this is a WTHH analysis … as in “What The Heck Happened??”) I found that getting high phone contact rates seemed to be linked with making a sufficient number of call attempts per prospect. For us, any fewer than three attempts per prospect is too few to acquire new donors in any great number. In general, contact rates improve with call attempt numbers above three, and after that, the more the better.
“Whoa!”, I hear you protest. “Didn’t you just say in your first post that it makes no sense to have a set number of call attempts for all prospects?”
You’re right — I did. It doesn’t make sense to have a limit. But it might make sense to have a minimum.
To get anything from an acquisition segment, more calling is better. However, by “call more” I don’t mean call more people. I mean make more calls per prospect. The RIGHT prospects. Call the right people, and eventually many or most of them will pick up the phone. Call the wrong people, and you can ring them up 20, 30, 50 times and you won’t make a dent. That’s why I think there’s no reason to set a maximum number of call attempts. If you’re calling the right people, then just keep calling.
What’s new here is that three attempts looks like a solid minimum. This is higher than what I see some people reporting on the listservs, and well beyond the capacity of many programs as they are currently run — the ones that call every single person with a phone number in the database. To attain the required amount of per-prospect effort, those schools would have to increase phone capacity (more students, more nights), or load fewer prospects. The latter option is the only one that makes sense.
Reducing the number of people we’re trying to reach to acquire as new donors means using a predictive model or at least some basic data mining and scoring to figure out who is most likely to pick up the phone. I’ve built models that do that for two years now, and after evaluating their performance I can say that they work okay. Not super fantastic, but okay. I can live with okay … in the past five years our program has made close to one million call attempts. Even a marginal improvement in focus at that scale of activity makes a significant difference.
You don’t need to hack your acquisition segment in half today. I’m not saying that. To get new donors you still need lots and lots of prospects. Maybe someday you’ll be calling only a fraction of the people you once did, but there’s no reason you can’t take a gradual approach to getting more focused in the meantime. Trim things down a bit in the first year, evaluate the results, and fold what you learned into trimming a bit more the next year.
I had a thoughtful response to my blog post from earlier this week (What do we do about Phonathon?) from Paul Fleming, Database Manager at Walnut Hill School for the Arts in Natick, Massachusetts, about half an hour from downtown Boston. With Paul’s permission, I will quote from his email, and then offer my comments afterword:
I just wanted to share with you some of my experiences with Phonathon. I am the database manager of a 5-person Development department at a wonderful boarding high school called the Walnut Hill School for the Arts. Since we are a very small office, I have also been able to take on the role of the organizer of our Phonathon. It’s only been natural for me to combine the two to find analysis about the worth of this event, and I’m happy to say, for our own school, this event is amazingly worthwhile.
First of all, as far as cost vs. gain, this is one of the cheapest appeals we have. Our Phonathon callers are volunteer students who are making calls either because they have a strong interest in helping their school, or they want to be fed pizza instead of dining hall food (pizza: our biggest expense). This year we called 4 nights in the fall and 4 nights in the spring. So while it is an amazing source of stress during that week, there aren’t a ton of man-hours put into this event other than that. We still mail letters to a large portion of our alumni base a few times a year. Many of these alumni are long-shots who would not give in response to a mass appeal, but our team feels that the importance of the touch point outweighs the short-term inefficiencies that are inherent in this type of outreach.
Secondly, I have taken the time to prioritize each of the people who are selected to receive phone calls. As you stated in your article, I use things like recency and frequency of gifts, as well as other factors such as event participation or whether we have other details about their personal life (job info, etc). We do call a great deal of lapsed or nondonors, but if we find ourselves spread too thin, we make sure to use our time appropriately to maximize effectiveness with the time we have. Our school has roughly 4,400 living alumni, and we graduate about 100 wonderful, talented students a year. This season we were able to attempt phone calls to about 1,200 alumni in our 4 nights of calling. The higher-priority people received up to 3 phone calls, and the lower-priority people received just 1-2.
Lastly, I was lucky enough to start working at my job in a year in which there was no Phonathon. This gave me an amazing opportunity to test the idea that our missing donors would give through other avenues if they had no other way to do so. We did a great deal of mass appeals, indirect appeals (alumni magazine and e-newsletters), and as many personalized emails and phone calls as we could handle in our 5-person team. Here are the most basic of our findings:
In FY11 (our only non-Phonathon year), 12% of our donors were repeat donors. We reached about 11% participation, our lowest ever. In FY12 (the year Phonathon returned):
- 27% of our donors were new/recovered donors, a 14% increase from the previous year.
- We reached 14% overall alumni participation.
- Of the 27% of donors who were considered new/recovered, 44% gave through Phonathon.
- The total amount of donors we had gained from FY11 to FY12 was about the same number of people who gave through the Phonathon.
- In FY13 (still in progess, so we’ll see how this actually plays out), 35% of the previously-recovered donors who gave again gave in response to less work-intensive mass mailing appeals, showing that some of these Phonathon donors can, in fact, be converted and (hopefully) cultivated long-term.
In general, I think your article was right on point. Large universities with a for-pay, ongoing Phonathon program should take a look and see whether their efforts should be spent elsewhere. I just wanted to share with you my successes here and the ways in which our school has been able to maintain a legitimate, cost-effective way to increase our participation rate and maintain the quality of our alumni database.
Paul’s description of his program reminds me there are plenty of institutions out there who don’t have big, automated, and data-intensive calling programs gobbling up money. What really gets my attention is that Walnut Hill uses alumni affinity factors (event attendance, employment info) to prioritize calling to get the job done on a tight schedule and with a minimum of expense. This small-scale data mining effort is an example for the rest of us who have a lot of inefficiency in our programs due to a lack of focus.
The first predictive models I ever created were for a relatively small university Phonathon that was run with printed prospect cards and manual dialing — a very successful program, I might add. For those of you at smaller institutions wondering if data mining is possible only with massive databases, the answer is NO.
And finally, how wonderful it is that Walnut Hill can quantify exactly what Phonathon contributes in terms of new donors, and new donors who convert to mail-responsive renewals.
Bravo!
Data prep aside, it really isn’t that hard to produce a model to predict giving, once you know how. The simplest of models can be expected to give good results. Take one step beyond, however, and things get tricky. Your model may indeed predict giving, but it may NOT necessarily predict conversion — that is, conversion from from non-donor to donor status.
What’s this, you ask? This CoolData guy is always saying that donor acquisition is where predictive modeling really shines, so why is he backpedaling today?
Well, I still DO believe that predictive modeling gives you insight into your deep non-donor pool and helps you decide who to focus your efforts on. But there’s a catch: You may be led astray if you fail to properly define the question you’re trying to answer.
By example, I will show you a model that appeared valid on the surface, but ultimately failed. And then I will explain what I did wrong — and how you can avoid making the same mistakes.
Last summer I had the pleasure of visiting with fundraising staff at a university in another province and showing them what data mining was doing for us. Their Annual Giving manager had a data file pulled from Raiser’s Edge, all ready to analyze, and we did so, in real time, during the course of a day-long workshop.
The model we created was a demo only — done very quickly, without much attention paid to how it would be used — and in fact the resulting score set was not used for anything. But we did have this score set, and I was reasonably sure that the higher scorers would be the better donors, and that a little followup analysis would put the icing on the cake.
So about a year after my visit, I offered to show how the alumni who had given since my visit broke down by the score we had prepared. My hosts sent me the new giving data, and off I went.
All seemed well at first. Have a look at these two charts. The high-scoring alumni (by score decile) gave the most in total dollars, and they also had the highest rate of participation in the annual fund.
No surprises there; I’ve seen this again and again. Then I got over-confident. The small university I did this work for had new-donor acquisition as one of its key goals for the Annual Fund, so I asked them to identify which donors were newly-acquired in the past year, so I could show how they broke down by score. I expected the model would perform well for predicting their participation as well.
There were 300 new donors. Their chart looked like this:
Quite a different story, isn’t it? I expected new donors would be clustered in the top scores, but that’s not what happened. Had my hosts used our demo model to get more focused for the purpose of acquisition, they would have been digging in the wrong places. This model would have been useless — even harmful.
What happened?
It appears that the model was good at finding EXISTING donors, but not POTENTIAL donors. This suggests to me that certain predictor variables that we used must have been proxies for “Is a donor”. (For example, maybe we used event attendance data that seemed predictive, but the event was a donor-recognition dinner — that’s a proxy, or stand-in, for being a donor — and not usable as a predictor.)
That’s a lesson to understand the data you’re using, because mistakes can creep in quite easily when one throws a model together too quickly. Other factors that are probably implicated in this failure include:
Too general a model – 1: The model was not specifically an Annual Giving model. It included any kind of giving in the outcome variable (the predicted value), including major gifts (if I recall correctly). In that type of model, ‘Age’ is given a lot of weight, and younger alumni (who might make up the bulk of new donors) tend to receive depressed scores. In fact, about 60 of those 321 new donors (almost 20%) were Class of 2009, which at that time was the most recent graduating class. The university really focused on getting their support during the phonathon, but this model wouldn’t have been much help in targeting them.
Too general a model – 2: If predicting acquisition really was an over-arching goal, then the model question should have been defined specifically for that purpose. The model should have been trained differently — perhaps a 0/1 variable, indicating recent conversion to participation in the Fund. This requires more work in preparing a single variable — Y, the outcome variable — but it is central to the success of the model.
All eggs in one basket: With a trickier predicted value to train on, the situation called for trying binary logistic regression as well as multiple linear regression — and then testing to see which one did a better job scoring a holdout sample of new donors.
No holdout sample: Which brings me to the final error I made that day — I didn’t have a holdout sample to test the validity of the model. I skipped that step for the sake of simplicity, but in practice you should think about validation right from the start.
Is there anything I did right? Well, I did conduct the test on recent giving that alerted me to the fact that this model did a poor job on prediction for acquisition. This testing, which occurs after the fact, is not the same as validation, which simply gives some reassurance that your model will work in the future. But it is equally important, as it may highlight issues you are not aware of and need to address in future iterations of the model.
In summary, to avoid model suckage you must: know your data in order to maximize the independence of your predictors; define your dependent variable carefully to answer the specific question you’re trying to answer; use different models and test them against each other, and finally, use a holdout sample or some other validation method.