In 2007 and 2008 we used predictive models in Annual Giving to segment the entire alumni population into deciles according to propensity to give. Both years, our annual giving coordinator noticed that alumni in the highest deciles (9 and 10) seemed to hang up on callers with unexpected frequency.
An analysis of the cases who hung up on callers bore her observation out. Hang-ups, rudeness and other “red flags” are recorded in our database as text comments, rather than validated codes. Therefore, a little text mining was required to identify IDs who exhibited these behaviours.
(In a previous post, I described a very manual yet simple method of extracting potential predictor variables from the kind of free-form text found in database comment fields or survey responses. Today, I’m using text-mined variables not for predicting giving, but for comparing two whole models with each other. More on that in a bit.)
When I mined the comments, I discovered that fully half of the people who hung up had a score of 6 or higher. The model was failing to weed out people who were not receptive to phone solicitation. Of course, our higher scorers were giving more than lower scorers overall … but could we do better?
The answer was yes.
The models created in 2007 and 2008 were aimed at predicting giving at any level (from annual giving to major giving), via any channel (phone, mail, etc.), and based on past giving made at any time (i.e., lifetime giving rather than recent giving).
In short, these were very general models, not Annual Giving models. Our high-scoring hanger-uppers were donors: Many of them gave quite generously, in fact. They just didn’t give via the calling program. Most gave on their own, or in response to a mail solicitation. (For whatever reason, they had not been added to our do-not-call list, so they continued to receive unwanted calls.)
They did deserve to be high scorers – but not for the calling program.
In 2009 I took a different approach to defining the predicted value (a.k.a. dependent variable):
- Instead of predicting for any type of giving, I narrowed our focus to gifts made to Annual Giving.
- Instead of gifts via any type of solicitation in Annual Giving, I counted only donations made in response to a phone call.
- Instead of using Lifetime Giving as our predicted value, I limited it to the past six fiscal years of giving.
How did our hanger-uppers score now, with the new model? The results were dramatically different. For testing the improvement, I had two text-mined indicator variables to work with, one for all IDs that had ever hung up on a caller, and another for anyone who had ever been rude to a caller. Neither variable had been used as a predictor in my models, so they were perfect for conducting an independent test of the new model’s ability to target the right people.
To compare the two old models with the new one, I simply looked at how the alumni responsible for unpleasant encounters were distributed by score decile.
Have a look at how these two charts compare. The one labeled ‘Old decile’ shows how ‘hanger-uppers’ scored in the older model (2008). As I said earlier, a lot of them were high scorers. (I’m not saying how many – notice I’ve removed the Y axis scale – I want to show you the distribution, not the actual numbers. The vertical scale differs from one chart to the other.)
The chart at right shows the same people, as they were scored in the new, phonathon-specific model (2009). In the new model, only 34% of hanger-uppers score 6 or higher – compared with 50% in the old model. As well, almost a third of them are clustered in the very lowest decile. Not perfect, but a big improvement.
Now, how about “rudeness“? Here are two more charts, same idea: The breakdown for the old model is on the left, and the one for the new model is on the right. Again, the (hidden) vertical scale is different: If they were shown on the same scale, the bar for the first decile in the chart on the left would actually be half the height of the bar for the first decile in the chart on the right.)
In the old model, people who were difficult on the phone were as likely to score high as score low. In the new model, however, they tend to be very low scorers. Again, a lot of them are lumped together in the lowest decile.
Remember: Neither of these variables was used as a predictor in the new model!
I don’t see myself ever going back to creating a model that isn’t specific to the task at hand, whether it’s Phonathon, event-attendance likelihood, Planned Giving potential, or what-have you. For Phonathon, getting smarter and more targeted means that fewer donors who are averse to being contacted by phone will be called, with the result that student callers will experience fewer unpleasant encounters and have a better experience on the job. It just makes sense.
Very interesting! I have created general models like the one you used first. Although I got promising results when compared to a random sample, I didn’t look at hang-ups. I will definitely keep this in mind in the future.
Great blog!
Comment by Annie — 9 February 2010 @ 4:02 pm
Annie, I shouldn’t give the impression that I’m dismissing the value of general models. The models I created in the first couple of years were hugely predictive and very valuable for segmentation. Because there’s so much more data in a general model than in a more specifically-targeted model, the fit will be better. Unfortunately, I sensed that the hang-ups and so on made it all too easy for others to dismiss the modeling effort, so I was keen to find ways to improve the method, and remove that barrier to acceptance. But again, that is NOT to say that general models are no good. How far should one go? That depends on where one is at, how much time is available for this stuff, and how personally interested/motivated one is to do it. Taking a simple and general approach to predictive modeling is a far better option than not doing it at all. Thank you for reading!
Comment by kevinmacdonell — 9 February 2010 @ 4:22 pm
P.S. For a whole week the draft of this post was headlined “Fear and loathing in the calling room,” but for some reason I chickened out on the day of posting and went with the boring headline.
Comment by kevinmacdonell — 9 February 2010 @ 4:23 pm
wow – timely post. we are just beginning to have a discussion with an external vendor for this service. And I am curious about how they categorize call outcomes. Unfortunately we have never had a phone-a-thon before, so cannot create a model to predict likelihood to give through a phone solicitation channel. Too bad, wish we could! Thanks!
Comment by Diane — 9 February 2010 @ 4:47 pm
[…] In planning which predictive models I was going to create for this fall’s Annual Giving appeals, I had one main idea in mind: Donors are not equally receptive to both mail and phone solicitation. I knew from previous experience that I could build a good model trained on all Annual Fund giving regardless of source, but that it would not be optimal because it would fail to take “preferred mode of solicitation” into account. Great donors with high propensity-to-give scores will hang up on your Phonathon callers if their preferred mode of giving is by mail! (See Preventing hangups and rudeness in your Phonathon program.) […]
Pingback by Multiple models in Annual Fund: Worth the trouble? « CoolData blog — 18 October 2010 @ 5:31 am