People who build predictive models will tell you that there are certain variables you should avoid using as predictors. I am one of those people. However, we disagree on WHICH variables one should avoid, and increasingly this conflicting advice is confusing those trying to learn predictive modeling.
The differences involve two points in particular. Assuming charitable giving is the behaviour we’re modelling for, those two things are:
I will offer my opinions on both points. Note that they are opinions, not definitive answers.
1. Past giving as a predictor
I have always stressed that if you are trying to predict “giving” using a multiple linear regression model, you must avoid using “giving” as a predictor among your independent variables. That includes anything that is a proxy for “giving,” such as attendance at a donor-thanking event. This is how I’ve been taught and that is what I’ve adhered to in practice.
Examples that violate this practice keep popping up, however. I have an email from Atsuko Umeki, IT Coordinator in the Development Office of the University of Victoria in Victoria, British Columbia*. She poses this question about a post I wrote in July 2013:
“In this post you said, ‘In predictive models, giving and variables related to the activity of giving are usually excluded as variables (if ‘giving’ is what we are trying to predict). Using any aspect of the target variable as an input is bad practice in predictive modelling and is carefully avoided.’ However, in many articles and classes I read and took I was advised or instructed to include past giving history such as RFA*, Average gift, Past 3 or 5 year total giving, last gift etc. Theoretically I understand what you say because past giving is related to the target variable (giving likelihood); therefore, it will be biased. But in practice most practitioners include past giving as variables and especially RFA seems to be a good variable to include.”
(* RFA is a variation of the more familiar RFM score, based on giving history — Recency, Frequency, and Monetary value.)
So modellers-in-training are being told to go ahead and use ‘giving’ to predict ‘giving’, but that’s not all: Certain analytics vendors also routinely include variables based on past giving as predictors of future giving. Not long ago I sat in on a webinar hosted by a consultant, which referenced the work of one well-known analytics vendor (no need to name the vendor here) in which it seemed that giving behaviour was present on both sides of the regression equation. Not surprisingly, this vendor “achieved” a fantastic R-squared value of 86%. (Fantastic as in “like a fantasy,” perhaps?)
This is not as arcane or technical as it sounds. When you use giving to predict giving, you are essentially saying, “The people who will make big gifts in the future are the ones who have made big gifts in the past.” This is actually true! The thing is, you don’t need a predictive model to produce such a prospect list; all you need is a list of your top donors.
Now, this might be reassuring to whomever is paying a vendor big bucks to create the model. That person sees names they recognize, and they think, ah, good — we are not too far off the mark. And if you’re trying to convince your boss of the value of predictive modelling, he or she might like to see the upper ranks filled with familiar names.
I don’t find any of that “reassuring.” I find it a waste of time and effort — a fancy and expensive way to produce a list of the usual suspects.
If you want to know who has given you a lot of money, you make a list of everyone in your database and sort it in descending order by total amount given. If you want to predict who in your database is most likely to give you a lot of money in the future, build a predictive model using predictors that are associated with having given large amounts of money. Here is the key point … if you include “predictors” that mean the same thing as “has given a lot of money,” then the result of your model is not going to look like a list of future givers — it’s going to look more like your historical list of past givers.
Does that mean you should ignore giving history? No! Ideally you’d like to identify the donors who have made four-figure gifts who really have the capacity and affinity to make six-figure gifts. You won’t find them using past giving as a predictor, because your model will be blinded by the stars. The variables that represent giving history will cause all other affinity-related variables to pale in comparison. Many will be rejected from the model for being not significant or for adding nothing additional to the model’s ability to explain the variance in the outcome variable.
To sum up, here are the two big problems with using past giving to predict future giving:
Let’s try a thought experiment. What if I told you that I had a secret predictor that, once introduced into a regression analysis, could explain 100% of the variance in the dependent variable ‘Lifetime Giving’? That’s right — the highest value for R-squared possible, all with a single predictor. Would you pay me a lot of money for that? What is this magic variable that perfectly models the variance in ‘Lifetime Giving’? Why, it is none other than ‘Lifetime Giving’ itself! Any variable is perfectly correlated with itself, so why look any farther?
This is an extreme example. In a real predictive model, a predictor based on giving history would be restricted to giving from the past, while the outcome variable would be calculated from a more recent period — the last year or whatever. There should be no overlap. R-squared would not be 100%, but it would be very high.
The R-squared statistic is useful for guiding you as you add variables to a regression analysis, or for comparing similar models in terms of fit with the data. It is not terribly useful for deciding whether any one model is good or bad. A model with an R-squared of 15% may be highly valuable, while one with R-squared of 75% may be garbage. If a vendor is trying to sell you on a model they built based on a high R-squared alone, they are misleading you.
The goal of predictive modeling for major gifts is not to maximize R-squared. It’s to identify new prospects.
2. Using “attributes” as predictors
Another thing about that webinar bugged me. The same vendor advised us to “select variables with caution, avoiding ‘descriptors’ and focusing on potential predictors.” Specifically, we were warned that a marital status of ‘married’ will emerge as correlated with giving. Don’t be fooled! That’s not a predictor, they said.
So let me get this straight. We carry out an analysis that reveals that married people are more likely to give large gifts, that donors with more than one degree are more likely to give large gifts, that donors who have email addresses and business phone numbers in the database are more likely to give large gifts … but we are supposed to ignore all that?
The problem might not be the use of “descriptors,” the problem might be with the terminology. Maybe we need to stop using the word “predictor”. One experienced practitioner, Alexander Oftelie, briefly touched on this nuance in a recent blog post. I quote, (emphasis added by me):
“Data that on its own may seem unimportant — the channel someone donates, declining to receive the mug or calendar, preferring email to direct mail, or making ‘white mail’ or unsolicited gifts beyond their sustaining-gift donation — can be very powerful when they are brought together to paint a picture of engagement and interaction. Knowing who someone is isn’t by itself predictive (at best it may be correlated). Knowing how constituents choose to engage or not engage with your organization are the most powerful ingredients we have, and its already in our own garden.”
I don’t intend to critique Alexander’s post, which isn’t even on this particular topic. (It’s a good one — please read it.) But since he’s written this, permit me scratch my head about it a bit.
In fact, I think I agree with him that there is a distinction between a behaviour and a descriptor/attribute. A behaviour, an action taken at a specific point in time (eg., attending an event), can be classified as a predictor. An attribute (“who someone is,” eg., whether they are married or single) is better described as a correlate. I would also be willing to bet that if we carefully compared behavioural variables to attribute variables, the behaviours would outperform, as Alexander says.
In practice, however, we don’t need to make that distinction. If we are using regression to build our models, we are concerned solely and completely with correlation. To say “at best it may be correlated” suggests that predictive modellers have something better at their disposal that they should be using instead of correlation. What is it? I don’t know, and Alexander doesn’t say.
If in a given data set, we can demonstrate that being married is associated with likelihood to make a donation, then it only makes sense to use that variable in our model. Choosing to exclude it based on our assumption that it’s an attribute and not a behaviour doesn’t make business sense. We are looking for practical results, after all, not chasing some notion of purity. And let’s not fool ourselves, or clients, that we are getting down to causation. We aren’t.
Consider that at least some “attributes” can be stated in terms of a behaviour. People get married — that’s a behaviour, although not related to our institution. People get married and also tell us about it (or allow it to be public knowledge so that we can record it) — that’s also a behaviour, and potentially an interaction with us. And on the other side of the coin, behaviours or interactions can be stated as attributes — a person can be an event attendee, a donor, a taker of surveys.
If my analysis informs me that widowed female alumni over the age of 60 are extremely good candidates for a conversation about Planned Giving, then are you really going to tell me I’m wrong to act on that information, just because sex, age and being widowed are not “behaviours” that a person voluntarily carries out? Mmmm — sorry!
Call it quibbling over semantics if you like, but don’t assume it’s so easy to draw a circle around true predictors. There is only one way to surface predictors, which is to take a snapshot of all potentially relevant variables at a point in time, then gather data on the outcome you wish to predict (eg., giving) after that point in time, and then assess each variable in terms of the strength of association with that outcome. The tools we use to make that assessment are nothing other than correlation and significance. Again, if there are other tools in common usage, then I don’t know about them.
Caveats and concessions
I don’t maintain that this or that practice is “wrong” in all cases, nor do I insist on rules that apply universally. There’s a lot of art in this science, after all.
Using giving history as a predictor:
Using descriptors/attributes as predictors:
There are many approaches one can take with predictive modeling, and naturally one may feel that one’s chosen method is “best”. The only sure way to proceed is to take the time to define exactly what you want to predict, try more than one approach, and then evaluate the performance of the scores when you have actual results available — which could be a year after deployment. We can listen to what experts are telling us, but it’s more important to listen to what the data is telling us.
Note: When I originally posted this, I referred to Atsuko Umeki as “he”. I apologize for this careless error and for whatever erroneous assumption that must have prompted it.
A few years ago I met with an experienced Planned Giving professional who had done very well over the years without any help from predictive modeling, and was doing me the courtesy of hearing my ideas. I showed this person a series of charts. Each chart showed a variable and its association with the condition of being a current Planned Giving expectancy. The ultimate goal would have been to consolidate these predictors together as a score, in order to discover new expectancies in that school’s alumni database. The conventional factors of giving history and donor loyalty are important, I conceded, but other engagement-related factors are also very predictive: student activities, alumni involvement, number of degrees, event attendance, and so on.
This person listened politely and was genuinely interested. And then I went too far.
One of my charts showed that there was a strong association between being a Planned Giving expectancy and having a single initial in the First Name field. I noted that, for some unexplained reason, having a preference for a name like “S. John Doe” seemed to be associated with a higher propensity to make a bequest. I thought that was cool.
The response was a laugh. A good-natured laugh, but still — a laugh. “That sounds like astrology!”
I had mistaken polite interest for a slam-dunk, and in my enthusiasm went too far out on a limb. I may have inadvertently caused the minting of a new data-mining skeptic. (Eventually, the professional retired after completing a successful career in Planned Giving, and having managed to avoid hearing much more about predictive modeling.)
At the time, I had hastened to explain that what we were looking at were correlations — loose, non-causal relationships among various characteristics, some of them non-intuitive or, as in this case, seemingly nonsensical. I also explained that the linkage was probably due to other variables (age and sex being prime candidates). Just because it’s without explanation doesn’t mean it’s not useful. But I suppose the damage was done. You win some, you lose some.
Although some of the power (and fun) of predictive modeling rests on the sometimes non-intuitive and unexplained nature of predictor variables, I now think it’s best to frame any presentation to a general audience in terms of what they think of as “common sense”. Limiting, yes. But safer. Unless you think your listener is really picking up what you’re laying down, keep it simple, keep it intuitive, and keep it grounded.
So much for sell jobs. Let’s get back to the data … What ABOUT that “first-initial” variable? Does it really mean anything, or is it just noise? Is it astrology?
I’ve got this data set in front of me — all alumni with at least some giving in the past ten years. I see that 1.2% percent of all donors have a first initial at the front of their name. When I look at the subset of the records that are current Planned Giving expectancies, I see that 4.6% have a single-initial first name. In other words, Planned Giving expectancies are almost four times as likely as all other donors to have a name that starts with a single initial. The data file is fairly large — more than 17,000 records — and the difference is statistically significant.
What can explain this? When I think of a person whose first name is an initial and who tends to go by their middle name, the image that comes to mind is that of an elderly male with a higher than average income — like a retired judge, say. For each of the variables Age and Male, there is in fact a small positive association with having a one-character first name. Yet, when I account for both ‘Age’ and ‘Male’ in a regression analysis, the condition of having a leading initial is still significant and still has explanatory power for being a Planned Giving expectancy.
I can’t think of any other underlying reasons for the connection with Planned Giving. Even when I continue to add more and more independent variables to the regression, this strange predictor hangs in there, as sturdy as ever. So, it’s certainly interesting, and I usually at least look at it while building models.
On the other hand … perhaps there is some justification for the verdict of “astrology” (that is, “nonsense”). The data set I have here may be large, but the number of Planned Giving expectancies is less than 500 — and 4.6% of 500 is not very many records. Regardless of whether p ≤ 0.0001, it could still be just one of those things. I’ve also learned that complex models are not better than simple ones, particularly when trying to predict something hard like Planned Giving propensity. A quirky variable that suggests no potential causal pathway makes me wary of the possibility of overfitting the noise in my data and missing the signal.
Maybe it’s useful, maybe it’s not. Either way, whether I call it “cool” or not will depend on who I’m talking to.
(Click here to download post as a print-friendly PDF: Making a Case for Modeling – Wylie Sammis)
Before you wade too far into this piece, let’s be sure we’re talking to the right person. Here are some assumptions we’re making about you:
If we’ve made some accurate assumptions here, great. If we haven’t, we’d still like you to keep reading. But if you want to slip out the back of the seminar room, not to worry. We’ve done it ourselves more times than you can count.
Okay, here’s something you can try:
1. Divide the alums at your school into ten roughly equal size groups (deciles) by class year. Table 1 is an example from a medium sized four year college.
Table 1: Class Years and Counts for Ten Roughly Equal Size Groups (Deciles) of Alumni at School A
2. Create a very simple score:
EMAIL LISTED(1/0) + HOME PHONE LISTED(1/0)
This score can assume three values: “0, “1”, or “2.” A “0” means the alum has neither an email nor a home phone listed in the database. A “1” means the alum has either an email listed in the database or a home phone listed in the database, but not both. A “2” means the alum has both an email and a home phone listed in the database.
3. Create a table that contains the percentage of alums who have contributed at least $1,000 lifetime to your school for each score level for each class year decile. Table 1 is an example of such a table for School A.
Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A
4. Create a three dimensional chart that conveys the same information contained in the table. Figure 1 is an example of such a chart for School A.
In the rest of this piece we’ll be showing tables and charts from seven other very diverse schools that look quite similar to the ones you’ve just seen. At the end, we’ll step back and talk about the importance of what emerges from these charts. We’ll also offer advice on how to explain your own tables and charts to colleagues and bosses.
If you think the above table and chart are clear, go ahead and start browsing through what we’ve laid out for the other seven schools. However, if you’re not completely sure you understand the table and the chart, see if the following hypothetical questions and answers help:
Question: “Okay, I’m looking at Table 2 where it shows 53% for alums in Decile 1 who have a score of 2. Could you just clarify what that means?”
Answer. “That means that 53% of the oldest alums at the school who have both a home phone and an email listed in the database have given at least $1,000 lifetime to the school.”
Question. “Then … that means if I look to the far left in that same row where it shows 29% … that means that 29% of the oldest alums at the school who have neither a home phone nor an email listed in the database have given at least $1,000 lifetime to the school?”
Question. “So those older alums who have a score of 2 are way better givers than those older alums who have a score of 0?”
Answer. “That’s how we see it.”
Question. “I notice that in the younger deciles, regardless of the score, there are a lot of 0 percentages or very low percentages. What’s going on there?”
Answer. “Two things. One, most younger alums don’t have the wherewithal to make big gifts. They need years, sometimes many years, to get their financial legs under them. The second thing? Over the last seven years or so, we’ve looked at the lifetime giving rates of hundreds and hundreds of four-year higher education institutions. The news is not good. In many of them, well over half of the solicitable alums have never given their alma maters a penny.”
Question. “So, maybe for my school, it might be good to lower that giving amount to something like ‘has given at least $500 lifetime’ rather than $1,000 lifetime?”
Answer. Absolutely. There’s nothing sacrosanct about the thousand dollar level that we chose for this piece. You can certainly lower the amount, but you can also raise the amount. In fact, if you told us you were going to try several different amounts, we’d say, “Fantastic!”
Okay, let’s go ahead and have you browse through the rest of the tables and charts for the seven schools we mentioned earlier. Then you can compare your thoughts on what you’ve seen with what we think is going on here.
(Note: After looking at a few of the tables and charts, you may find yourself saying, “Okay, guys. Think I got the idea here.” If so, go ahead and fast forward to our comments.)
Table 3: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School B
Table 4: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School C
Table 5: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School D
Table 6: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School E
Table 7: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School F
Table 8: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School G
Table 9: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School H
Definitely a lot of tables and charts. Here’s what we see in them:
Now we’d like to deal with an often advanced argument against what you see here. It’s not at all uncommon for us to hear skeptics say: “Well, of course alumni on whom we have more personal information are going to be better givers. In fact we often get that information when they make a gift. You could even say that amount of giving and amount of personal information are pretty much the same thing.”
We disagree for at least two reasons:
Amount of personal information and giving in any alumni database are never the same thing. If you have doubts about our assertion, the best way to dispel those doubts is to look in your own alumni database. Create the same simple score we have for this piece. Then look at the percentage of alums for each of the three levels of the score. You will find plenty of alums who have a score of 0 who have given you something, and you will find plenty of alums with a score of 2 who have given you nothing at all.
We have yet to encounter a school where the IT folks can definitively say how an email address or a home phone number got into the database for every alum. Why is that the case? Because email addresses and home phone numbers find their way into alumni database in a variety of ways. Yes, sometimes they are provided by the alum when he or she makes a gift. But there are other ways. To name a few:
Now here’s the kicker. Your reactions to everything you’ve seen in this piece are critical. If you’re going to go to a skeptical boss to try to make a case for scouring your alumni database for new candidates for major giving, we think you need to have several reactions to what we’ve laid out here:
1. “WOW!” Not, “Oh, that’s interesting.” It’s gotta be, “WOW!” Trust us on this one.
2. You have to be champing at the bit to create the same kinds of tables and charts that you’ve seen here for your own data.
3. You have to look at Table 2 (that we’ve recreated below) and imagine it represents your own data.
Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A
Then you have to start saying things like:
“Okay, I’m looking at the third class year decile. These are alums who graduated between 1977 and 1983. Twenty-five percent of them with a score of 2 have given us at least $1,000 lifetime. But what about the 75% who haven’t yet reached that level? Aren’t they going to be much better bets for bigger giving than the 94% of those with a score of 0 who haven’t yet reached the $1,000 level?”
“A score that goes from 0 to 2? Really? What about a much more sophisticated score that’s based on lots more information than just email listed and home phone listed? Wouldn’t it make sense to build a score like that and look at the giving levels for that more sophisticated score across the class year deciles?”
If your reactions have been similar to the ones we’ve just presented, you’re probably getting very close to trying to making your case to the higher-ups. Of course, how you make that case will depend on who you’ll be talking to, who you are, and situational factors that you’re aware of and we’re not. But here are a few general suggestions:
Your first step should be making up the charts and figures for your own data. Maybe you have the skills to do this on your own. If not, find a technical person to do it for you. In addition to having the right skills, this person should think doing it would be cool and won’t take forever to finish it.
Choose the right person to show our stuff and your stuff to. More and more we’re hearing people in advancement say, “We just got a new VP who really believes in analytics. We think she may be really receptive to this kind of approach.” Obviously, that’s the kind of person you want to approach. If you have a stodgy boss in between you and that VP, find a way around your boss. There’s lots of ways to do that.
Do what mystery writers do; use the weapon of surprise. Whoever the boss you go to is, we’d recommend that you show them this piece first. After you know they’ve read it, ask them what they thought of it. If they say anything remotely similar to: “I wonder what our data looks like,” you say, “Funny you should ask.”
Whatever your reactions to this piece have been, we’d love to hear them.