In 2007 and 2008 we used predictive models in Annual Giving to segment the entire alumni population into deciles according to propensity to give. Both years, our annual giving coordinator noticed that alumni in the highest deciles (9 and 10) seemed to hang up on callers with unexpected frequency.
An analysis of the cases who hung up on callers bore her observation out. Hang-ups, rudeness and other “red flags” are recorded in our database as text comments, rather than validated codes. Therefore, a little text mining was required to identify IDs who exhibited these behaviours.
(In a previous post, I described a very manual yet simple method of extracting potential predictor variables from the kind of free-form text found in database comment fields or survey responses. Today, I’m using text-mined variables not for predicting giving, but for comparing two whole models with each other. More on that in a bit.)
When I mined the comments, I discovered that fully half of the people who hung up had a score of 6 or higher. The model was failing to weed out people who were not receptive to phone solicitation. Of course, our higher scorers were giving more than lower scorers overall … but could we do better?
The answer was yes.
The models created in 2007 and 2008 were aimed at predicting giving at any level (from annual giving to major giving), via any channel (phone, mail, etc.), and based on past giving made at any time (i.e., lifetime giving rather than recent giving).
In short, these were very general models, not Annual Giving models. Our high-scoring hanger-uppers were donors: Many of them gave quite generously, in fact. They just didn’t give via the calling program. Most gave on their own, or in response to a mail solicitation. (For whatever reason, they had not been added to our do-not-call list, so they continued to receive unwanted calls.)
They did deserve to be high scorers – but not for the calling program.
In 2009 I took a different approach to defining the predicted value (a.k.a. dependent variable):
- Instead of predicting for any type of giving, I narrowed our focus to gifts made to Annual Giving.
- Instead of gifts via any type of solicitation in Annual Giving, I counted only donations made in response to a phone call.
- Instead of using Lifetime Giving as our predicted value, I limited it to the past six fiscal years of giving.
How did our hanger-uppers score now, with the new model? The results were dramatically different. For testing the improvement, I had two text-mined indicator variables to work with, one for all IDs that had ever hung up on a caller, and another for anyone who had ever been rude to a caller. Neither variable had been used as a predictor in my models, so they were perfect for conducting an independent test of the new model’s ability to target the right people.
To compare the two old models with the new one, I simply looked at how the alumni responsible for unpleasant encounters were distributed by score decile.
Have a look at how these two charts compare. The one labeled ‘Old decile’ shows how ‘hanger-uppers’ scored in the older model (2008). As I said earlier, a lot of them were high scorers. (I’m not saying how many – notice I’ve removed the Y axis scale – I want to show you the distribution, not the actual numbers. The vertical scale differs from one chart to the other.)
The chart at right shows the same people, as they were scored in the new, phonathon-specific model (2009). In the new model, only 34% of hanger-uppers score 6 or higher – compared with 50% in the old model. As well, almost a third of them are clustered in the very lowest decile. Not perfect, but a big improvement.
Now, how about “rudeness“? Here are two more charts, same idea: The breakdown for the old model is on the left, and the one for the new model is on the right. Again, the (hidden) vertical scale is different: If they were shown on the same scale, the bar for the first decile in the chart on the left would actually be half the height of the bar for the first decile in the chart on the right.)
In the old model, people who were difficult on the phone were as likely to score high as score low. In the new model, however, they tend to be very low scorers. Again, a lot of them are lumped together in the lowest decile.
Remember: Neither of these variables was used as a predictor in the new model!
I don’t see myself ever going back to creating a model that isn’t specific to the task at hand, whether it’s Phonathon, event-attendance likelihood, Planned Giving potential, or what-have you. For Phonathon, getting smarter and more targeted means that fewer donors who are averse to being contacted by phone will be called, with the result that student callers will experience fewer unpleasant encounters and have a better experience on the job. It just makes sense.