Guest post by Peter B. Wylie and John Sammis
It was only a matter of time. Over the weekend, a longtime friend dumped a bucket of ice water over his head and posted the video to Facebook. He challenged three friends — me included — to take the Ice Bucket Challenge in support of ALS research. I passed on the cold shower, but this morning I did make a gift to ALS Canada, a cause I wouldn’t have supported had it not been for my friend Paul and the brilliant campaign he participated in.*
Universities and other charities are, of course, watching closely and asking themselves how they can replicate this phenomenon. Fine … I am skeptical that central planning and a modest budget can give birth to such a massive juggernaut of socially-responsible contagion … but I wish them luck.
While we can admire our colleagues’ amazing work and good fortune, I am not sure we should envy them. In the coming year, ALS charities will be facing a huge donor-retention issue. Imagine gaining between 1.5 and 2 million new donors in the span of a few months. Now, I have no knowledge of what ALS fundraisers really intend to do with their hordes of newly-acquired donors. Maybe retention is not a goal. But it is a sure thing that the world will move on to some other craze. Retaining a tiny fraction of these donors could make the difference between the ice bucket challenge being just a one-time, non-repeatable anomaly and turning it into a foundation for long-term support that permanently changes the game for ALS research.
Perhaps the ice bucket challenge can be turned into an annual event that becomes as established as the walks, runs and other participatory events that other medical-research charities have. Who knows.
For certain is that the majority of new donors will not give again. Also for certain is that it would be irresponsibly wasteful for charities to spread their retention budget equally over all new donors.
Which brings me to predictive modeling. Some portion of new donors WILL give again. Maybe something about the challenge touched them more deeply than the temporary fun of the ice bucket dare. Maybe they learned something about the disease. Maybe they know someone affected by ALS. There is no direct way to know. But I would be willing to bet that higher levels of engagement can be found in patterns in the data.
What factors might be predictors of longer-term engagement? It is not possible to say without some analysis, but sources of information might include:
Shreds of ambiguous clues scattered here and there, admittedly, but that is what a good predictive model detects and amplifies. If it were up to me, I would also have asked on the giving page whether the donor had done the ice bucket thing. A year from now, my friend Paul is going to clearly remember the shock of pouring ice water over his head, plus the positive response he got on Facebook, and this will bring to mind his gift and the need to give again. My choosing not to do so might be associated with a lower level of commitment, and thus a lower likelihood of renewing. Just a theory.**
Data-informed segmentation aimed at getting a second gift from newly-acquired donors is not quite as sexy as being an internet meme. However, unlike riding the uncontrollable wave of a social media sensation, retention is something that charities might actually be able to plan for.
** Update: I am told that actually, this question IS asked. I didn’t see it on the Canadian site, but maybe I just missed it. Great!
I was quoted on this topic in a story in the September 4th online edition of the Chronicle of Philanthropy. Link (subscribers only): After Windfall, ALS Group Grapples With 2.4-Million Donor Dilemma
No, this is not the last time I’ll write about Phonathon, but after today I promise to give it a rest and talk about something else. I just wanted to round out my post on the waste I see happening in donor acquisition via phone programs with some recent findings of mine. Your mileage may vary, or “YMMV” as they say on the listservs, so as usual don’t just accept what I say. I suggest questions that you might ask of your own data — nothing more.
I’ve been doing a thorough analysis of our acquisition efforts this past year. (The technical term for this is a WTHH analysis … as in “What The Heck Happened??”) I found that getting high phone contact rates seemed to be linked with making a sufficient number of call attempts per prospect. For us, any fewer than three attempts per prospect is too few to acquire new donors in any great number. In general, contact rates improve with call attempt numbers above three, and after that, the more the better.
“Whoa!”, I hear you protest. “Didn’t you just say in your first post that it makes no sense to have a set number of call attempts for all prospects?”
You’re right — I did. It doesn’t make sense to have a limit. But it might make sense to have a minimum.
To get anything from an acquisition segment, more calling is better. However, by “call more” I don’t mean call more people. I mean make more calls per prospect. The RIGHT prospects. Call the right people, and eventually many or most of them will pick up the phone. Call the wrong people, and you can ring them up 20, 30, 50 times and you won’t make a dent. That’s why I think there’s no reason to set a maximum number of call attempts. If you’re calling the right people, then just keep calling.
What’s new here is that three attempts looks like a solid minimum. This is higher than what I see some people reporting on the listservs, and well beyond the capacity of many programs as they are currently run — the ones that call every single person with a phone number in the database. To attain the required amount of per-prospect effort, those schools would have to increase phone capacity (more students, more nights), or load fewer prospects. The latter option is the only one that makes sense.
Reducing the number of people we’re trying to reach to acquire as new donors means using a predictive model or at least some basic data mining and scoring to figure out who is most likely to pick up the phone. I’ve built models that do that for two years now, and after evaluating their performance I can say that they work okay. Not super fantastic, but okay. I can live with okay … in the past five years our program has made close to one million call attempts. Even a marginal improvement in focus at that scale of activity makes a significant difference.
You don’t need to hack your acquisition segment in half today. I’m not saying that. To get new donors you still need lots and lots of prospects. Maybe someday you’ll be calling only a fraction of the people you once did, but there’s no reason you can’t take a gradual approach to getting more focused in the meantime. Trim things down a bit in the first year, evaluate the results, and fold what you learned into trimming a bit more the next year.
I had a thoughtful response to my blog post from earlier this week (What do we do about Phonathon?) from Paul Fleming, Database Manager at Walnut Hill School for the Arts in Natick, Massachusetts, about half an hour from downtown Boston. With Paul’s permission, I will quote from his email, and then offer my comments afterword:
I just wanted to share with you some of my experiences with Phonathon. I am the database manager of a 5-person Development department at a wonderful boarding high school called the Walnut Hill School for the Arts. Since we are a very small office, I have also been able to take on the role of the organizer of our Phonathon. It’s only been natural for me to combine the two to find analysis about the worth of this event, and I’m happy to say, for our own school, this event is amazingly worthwhile.
First of all, as far as cost vs. gain, this is one of the cheapest appeals we have. Our Phonathon callers are volunteer students who are making calls either because they have a strong interest in helping their school, or they want to be fed pizza instead of dining hall food (pizza: our biggest expense). This year we called 4 nights in the fall and 4 nights in the spring. So while it is an amazing source of stress during that week, there aren’t a ton of man-hours put into this event other than that. We still mail letters to a large portion of our alumni base a few times a year. Many of these alumni are long-shots who would not give in response to a mass appeal, but our team feels that the importance of the touch point outweighs the short-term inefficiencies that are inherent in this type of outreach.
Secondly, I have taken the time to prioritize each of the people who are selected to receive phone calls. As you stated in your article, I use things like recency and frequency of gifts, as well as other factors such as event participation or whether we have other details about their personal life (job info, etc). We do call a great deal of lapsed or nondonors, but if we find ourselves spread too thin, we make sure to use our time appropriately to maximize effectiveness with the time we have. Our school has roughly 4,400 living alumni, and we graduate about 100 wonderful, talented students a year. This season we were able to attempt phone calls to about 1,200 alumni in our 4 nights of calling. The higher-priority people received up to 3 phone calls, and the lower-priority people received just 1-2.
Lastly, I was lucky enough to start working at my job in a year in which there was no Phonathon. This gave me an amazing opportunity to test the idea that our missing donors would give through other avenues if they had no other way to do so. We did a great deal of mass appeals, indirect appeals (alumni magazine and e-newsletters), and as many personalized emails and phone calls as we could handle in our 5-person team. Here are the most basic of our findings:
In FY11 (our only non-Phonathon year), 12% of our donors were repeat donors. We reached about 11% participation, our lowest ever. In FY12 (the year Phonathon returned):
- 27% of our donors were new/recovered donors, a 14% increase from the previous year.
- We reached 14% overall alumni participation.
- Of the 27% of donors who were considered new/recovered, 44% gave through Phonathon.
- The total amount of donors we had gained from FY11 to FY12 was about the same number of people who gave through the Phonathon.
- In FY13 (still in progess, so we’ll see how this actually plays out), 35% of the previously-recovered donors who gave again gave in response to less work-intensive mass mailing appeals, showing that some of these Phonathon donors can, in fact, be converted and (hopefully) cultivated long-term.
In general, I think your article was right on point. Large universities with a for-pay, ongoing Phonathon program should take a look and see whether their efforts should be spent elsewhere. I just wanted to share with you my successes here and the ways in which our school has been able to maintain a legitimate, cost-effective way to increase our participation rate and maintain the quality of our alumni database.
Paul’s description of his program reminds me there are plenty of institutions out there who don’t have big, automated, and data-intensive calling programs gobbling up money. What really gets my attention is that Walnut Hill uses alumni affinity factors (event attendance, employment info) to prioritize calling to get the job done on a tight schedule and with a minimum of expense. This small-scale data mining effort is an example for the rest of us who have a lot of inefficiency in our programs due to a lack of focus.
The first predictive models I ever created were for a relatively small university Phonathon that was run with printed prospect cards and manual dialing — a very successful program, I might add. For those of you at smaller institutions wondering if data mining is possible only with massive databases, the answer is NO.
And finally, how wonderful it is that Walnut Hill can quantify exactly what Phonathon contributes in terms of new donors, and new donors who convert to mail-responsive renewals.
Data prep aside, it really isn’t that hard to produce a model to predict giving, once you know how. The simplest of models can be expected to give good results. Take one step beyond, however, and things get tricky. Your model may indeed predict giving, but it may NOT necessarily predict conversion — that is, conversion from from non-donor to donor status.
What’s this, you ask? This CoolData guy is always saying that donor acquisition is where predictive modeling really shines, so why is he backpedaling today?
Well, I still DO believe that predictive modeling gives you insight into your deep non-donor pool and helps you decide who to focus your efforts on. But there’s a catch: You may be led astray if you fail to properly define the question you’re trying to answer.
By example, I will show you a model that appeared valid on the surface, but ultimately failed. And then I will explain what I did wrong — and how you can avoid making the same mistakes.
Last summer I had the pleasure of visiting with fundraising staff at a university in another province and showing them what data mining was doing for us. Their Annual Giving manager had a data file pulled from Raiser’s Edge, all ready to analyze, and we did so, in real time, during the course of a day-long workshop.
The model we created was a demo only — done very quickly, without much attention paid to how it would be used — and in fact the resulting score set was not used for anything. But we did have this score set, and I was reasonably sure that the higher scorers would be the better donors, and that a little followup analysis would put the icing on the cake.
So about a year after my visit, I offered to show how the alumni who had given since my visit broke down by the score we had prepared. My hosts sent me the new giving data, and off I went.
All seemed well at first. Have a look at these two charts. The high-scoring alumni (by score decile) gave the most in total dollars, and they also had the highest rate of participation in the annual fund.
No surprises there; I’ve seen this again and again. Then I got over-confident. The small university I did this work for had new-donor acquisition as one of its key goals for the Annual Fund, so I asked them to identify which donors were newly-acquired in the past year, so I could show how they broke down by score. I expected the model would perform well for predicting their participation as well.
There were 300 new donors. Their chart looked like this:
Quite a different story, isn’t it? I expected new donors would be clustered in the top scores, but that’s not what happened. Had my hosts used our demo model to get more focused for the purpose of acquisition, they would have been digging in the wrong places. This model would have been useless — even harmful.
It appears that the model was good at finding EXISTING donors, but not POTENTIAL donors. This suggests to me that certain predictor variables that we used must have been proxies for “Is a donor”. (For example, maybe we used event attendance data that seemed predictive, but the event was a donor-recognition dinner — that’s a proxy, or stand-in, for being a donor — and not usable as a predictor.)
That’s a lesson to understand the data you’re using, because mistakes can creep in quite easily when one throws a model together too quickly. Other factors that are probably implicated in this failure include:
Too general a model – 1: The model was not specifically an Annual Giving model. It included any kind of giving in the outcome variable (the predicted value), including major gifts (if I recall correctly). In that type of model, ‘Age’ is given a lot of weight, and younger alumni (who might make up the bulk of new donors) tend to receive depressed scores. In fact, about 60 of those 321 new donors (almost 20%) were Class of 2009, which at that time was the most recent graduating class. The university really focused on getting their support during the phonathon, but this model wouldn’t have been much help in targeting them.
Too general a model – 2: If predicting acquisition really was an over-arching goal, then the model question should have been defined specifically for that purpose. The model should have been trained differently — perhaps a 0/1 variable, indicating recent conversion to participation in the Fund. This requires more work in preparing a single variable — Y, the outcome variable — but it is central to the success of the model.
All eggs in one basket: With a trickier predicted value to train on, the situation called for trying binary logistic regression as well as multiple linear regression — and then testing to see which one did a better job scoring a holdout sample of new donors.
No holdout sample: Which brings me to the final error I made that day — I didn’t have a holdout sample to test the validity of the model. I skipped that step for the sake of simplicity, but in practice you should think about validation right from the start.
Is there anything I did right? Well, I did conduct the test on recent giving that alerted me to the fact that this model did a poor job on prediction for acquisition. This testing, which occurs after the fact, is not the same as validation, which simply gives some reassurance that your model will work in the future. But it is equally important, as it may highlight issues you are not aware of and need to address in future iterations of the model.
In summary, to avoid model suckage you must: know your data in order to maximize the independence of your predictors; define your dependent variable carefully to answer the specific question you’re trying to answer; use different models and test them against each other, and finally, use a holdout sample or some other validation method.
When I was a prospect researcher working in Major Gifts and doing a little predictive modeling on the side, I was innocent of the intricacies of Annual Giving. I produced the models, and waited for the magic to happen. Hey, I wondered, how hard can it be? I tell you who’s most likely to give, and you get to work on that. Done.
Today, I’m responsible for running a phonathon program. Now I’M the guy who’s supposed to apply predictive scores to the Annual Fund, without messing everything up. What looked simple from the outside now looks like Rubik’s Cube. And I never solved Rubik’s Cube. (Had the book, but not the patience.)
In the same way, I have found that books and other resources on phonathon management are just no help when it comes to propensity-based segmentation. There seem to be no readily-available prototypes.
So let’s change that. Today I’m going to share a summary of the segmentation we hope to implement this fall. It’s not as detailed as our actual plan, but should give enough information to inform the development of your own plan. I’ve had to tear the Rubik’s Cube apart, as I used to do as a kid, and reassemble from scratch.
Click on this link to open a Word doc: “Phone segments”. Each row in this document is a separate segment. The segments are grouped into “blocks,” according to how many call attempts will be applied to each segment. Notice that the first column is “Decile Score”. That’s right, the first level of selection is going to be the propensity-of-giving predictive score I created for Phonathon.
It would seem that shifting from “traditional” segmentation is just as easy as that, but in fact making this change took a lot of hard thinking and consultation with others who have experience with phonathon. (*** See credits at end.)
Why was it so hard? Read on!
The first thing we have to ask ourselves is, why do we segment at all? The primary goals, as I see them, are:
There are other reasons, including being able to track performance of various groups over time, but these are the most important at this planning stage.
By “prioritization,” I mean that alumni with the greatest propensity to give should be given special consideration in order to maximize response. Alumni who are most likely to give should:
The other goal, “messaging,” is simple enough to understand: We tailor our message to alumni based on what sort of group they are in. Alumni fall into a few groups based on their past donor history (LYBUNT, SYBUNT, never donor), which largely determines our solicitation goal for them (Leadership, Renewal, Acquisition). Alumni are also segmented by faculty, a common practice when alumni are believed to feel greater affinity for their faculty than they do the university as a whole. There may also be special segments created for other characteristics (young alumni, for example), or for special fundraising projects that need to be treated separately.
The “message” goal is often placed at the centre of phonathon segmentation — at the expense of optimizing treatment of the best prospects, in my view. In many programs, a rigid structure of calling by faculty commonly prevails, exhausting one message-defined pool (eg. “Law, Donors”) before moving on to the next. There are benefits to working one homogeneous calling pool at a time — callers can more quickly become familiar with the message (and objections to it) if it stays consistent through the night. However, overall gains in the program might be realized by taking a more propensity-driven approach.
Predictive modeling for propensity to give is the “new thing” that allows us to bring prioritization to the fore. Traditionally, propensity to give has been determined mainly by previous giving history, which is based on reasonable assumptions: Alumni who have given recently are most likely to give again. This approach works for donors, but is not helpful for segmenting the non-donor pool for acquisition. Predictive modeling is a marked improvement over giving history alone for segmenting donors as well: a never-donor who has a high likelihood of giving is far more valuable to the institution than a donor who is very unlikely to renew. Only predictive modeling can give us the insight into the unknown to allow us to decide who is the better prospect.
The issue: Layers of complexity
We need to somehow incorporate the scores from the predictive model into segmentation. But simply creating an additional level of segmentation will create an unreasonable amount of complexity: “Score decile 10, Law, donors”, “Score decile 9, Medicine, non-donors”, etc. etc. The number of segments would become unmanageable and many of them would be too small, especially when additionally broken up by time zone.
I considered keeping the traditional segments (Faculty and donor status) and simply ordering the individual prospects within each segment using a very granular score. This would require us to make a judgment call about when we should drop a segment and move on to the next one. The risk in doing so is that in leaving it to our judgment, we will either drop the segment too early, leaving money on the table, or call too deep into the segment before moving on. Calling alumni with a decile score of 7 before at least one attempt to ALL the 10s runs counter to the goal of prioritizing on best prospects.
So, what should we do?
The proposed new strategy going forward will draw a distinction between Prioritization and Messaging. Calling segments will be based on a combination of Propensity Score and Donor Status. More of the work involved in the “message” component (based on Faculty and past giving designations) will be managed at the point of the call, via the automated calling system and the caller him/herself.
The intention is to move messaging out of segmentation and into a combination of our automated dialing system’s conditional scripting features and the judgment of the caller. The callers will continue to be shown specific degree information, with customized scripts based on this information. The main difference from the caller’s point of view is that he or she will be speaking with alumni of numerous degree types on any given night, instead of just one or two.
Our system offers the ability to compose scripts that contain conditional statements, so that the message the caller presents changes on the fly in response to the particulars of the prospect being called (eg. degree and faculty, designation of last gift, and so on). This feature works automatically and requires no effort from callers, except to the extent that there are more talking points to absorb simultaneously.
The caller’s prospect information screen offers data on a prospect’s past giving. When historical gift designations disagree with a prospect’s faculty, the caller will need to shift gears slightly and ask the prospect if he or she wishes to renew their giving to that designation, rather than the default (faculty of preferred degree).
Shifting this aspect from segmentation to the point of the call is intended to remove a layer of complexity from segmentation, thereby making room for propensity to give. See?
‘Faculty’ will be removed as a primary concern in segmentation, by collapsing all the specific faculties into two overarching groups: Undergraduate degrees and graduate/professional degrees. This grouping preserves one of the fundamental differences between prospects (their stage of life while a student) while preventing the creation of an excessive number of tiny segments.
Have a look at the Excel file again. The general hierarchy for segmentation will be Score Decile (ten possible levels), then Donor Status (two levels, Donor and Non-Donor), then Graduate-Professional/Undergraduate (two levels). Therefore the number of possible segments is 10 x 2 x 2 = 40. In practice there will be more than 40, but this number will be manageable. As well, although we will call as many prospects as we possibly can, it is not imperative that we call the very lowest deciles, where the probability of finding donors is extremely low. Having leftover segments at the end of the winter term is likely, but not cause for concern.
This is only a general structure — some segments may be split or collapsed depending on how large or small they are. As well, I will break out other segments for New Grads and any special projects we are running this year. And call attempt limits may require rejigging throughout the season, based on actual response.
New Grad calling is limited to Acquisition, for the purpose of caller training and familiarization. I prefer Renewal calling to be handled by experienced callers, therefore new-grad renewals are included in other segments.
Other issues that may require attention in your segmentation include double-alumni households (do both spouses receive a call, and if so, when?), and creating another segment to capture alumni who did not receive a propensity score because they were not contactable at time of model creation.
Calling pools are mixed with regard to faculty, so the message will vary from call to call. Callers won’t know from the outset who they will be speaking with (Law, Medicine, etc.), and will require multiple scripts to cover multiple prospect types. Training and familiarization with the job will take longer.
The changes will require a little more attentiveness on the part of call centre employees. The script will auto-populate the alum’s preferred faculty. However, the caller must be prepared to modify the conversation on the fly based on other information available, i.e. designation of last (or largest) gift. The downside is that callers may take more time to become proficient. However, the need to pay attention to context may help to keep callers more engaged with their work, as opposed to mindlessly reading the same script over and over all night.
Another potential issue is that some faculties are at a disadvantage because they have fewer high-scoring alumni. The extent to which this might be a problem can only be determined by looking at the data to see how each faculty’s alumni are distributed by score decile. Some redistribution among segments may be necessary if any one faculty is found to be at a severe disadvantage. Note that it cuts both ways, though: In the traditional segmentation, entire faculties were probably placed at a disadvantage because they had lower priority — based on nothing more than the need to order them in some kind of sequence.
As I say, this is all new, and untested. How large or small the proposed segments will be remains undetermined. How well the segmentation will work is not known. I am interested to hear how others have dealt with the issue of applying their predictive models to phonathon.
*** CREDITS: I owe thanks to Chris Steeves, Development Officer for the Faculty of Management at Dalhousie University, and a former Annual Giving Officer responsible for Phonathon, for his enthusiastic support and for certain ideas (such as collapsing faculties into ‘graduate’ and ‘undergraduate’ categories) which are a key part of this segmentation plan. Also, thanks to Marni Tuttle, Associate Director, Advancement Services at Dalhousie University, for her ideas and thorough review of this post in its various drafts.