Guest post by Kate Chamberlin and Michelle Paladino, Office of Development, Memorial Sloan-Kettering Cancer Center, New York NY

The most important variable doesn’t always get the attention it deserves. (Creative Commons license. Click image for source.)
It’s an old statistics joke that when building a predictive model, you spend almost all of the time slaving over the data, only so that at the end of the slog, you get to press a button for the actual fun part. Cleaning data, imputing missing values, and restructuring, along with ceaseless contemplation about new and improved independent variables is how we expend much of our energy and rightly so. However, the most important variable doesn’t always get the attention it deserves.
All predictive models center on a particular population, the dependent variable – so giving the target a little extra TLC goes a long way. It’s a balancing act between size and purity. The larger the target size, the more statistical reliability you have, but the more precise the target definition, the better you are able to isolate the behavior that you’re trying to predict. An example is that corporations, foundations, and estates behave differently than individuals. Therefore, if your goal is to find individuals who have a high likelihood to make a major gift, clearing out those estates, foundations and corporations, even if they have given the target amount, will lead to a more trustworthy dependent variable.
And what is the best target cutoff amount for a major gift, anyway? In our Development Office, the Major Gifts program starts at $50,000. A binary dependent variable with the 1’s defined as individuals who have made a $50,000+ gift is perfectly reasonable and works just fine, but is this cutoff meaningful from a donor’s perspective? And what about timing – do major donors of long ago look the same as those who have given more recently? We have yet to find the definitive answers, but checking to see if the independent variable distributions change dramatically with different targets and running models with a few flavors of target populations is a good way to evaluate if these changes make a difference.
Another method that can help you more clearly define the dependent variable is to consider to which donors you will be applying your model scores. For instance, if you work in a strictly donor database, as we do, and you are modeling for major donors, it is a good idea to exclude from your target those who came onto the file at the major giving target amount. In other words, remove the individuals whose first gift was $50,000+ because if the scores will be applied to donors who are giving below the target right now, then your dependent variable should only include a population that gave below the target level and then jumped up to the target amount.
But when does the pruning of your target go too far? If it becomes too small, then the performance of a few donors can have a big effect. A minimum sample size of 30 is a magic rule-of-thumb that is mentioned regularly in the classroom. If we were to approach that number in our dependent variable, we would be likely to redefine our target to increase the sample size. In the example above, we might choose to lower the major gift threshold to $25,000. We’d definitely be interested to hear about less “magical” methods you might use to determine a lower bound for your target sample size!
Recently, our Annual Giving team started an initiative to solicit donors for $5,000+ gifts, which would qualify the donor to name a chair in our auditorium. As conscientious modelers, we set out to find donors who had given $5,000+. There are quite a few of them, so we had some flexibility to isolate the behavior we were trying to predict. We wondered: are all $5,000 gifts created equal? Isn’t this a particular kind of $5,000? To find a better proxy for this unique subset of donors – those who would respond to the chance to have their name on a chair – we narrowed the target population to individuals who had opted to join our Partners for Excellence program, which offers a range of incentives for giving levels starting at $1,000. Donors who responded to this kind of approach seem a bit more like those who would appreciate a naming opportunity, but what about the people who had been giving $5,000+ every year and then gave the same amount to Partners? Since we were looking to find donors who would be inspired by a particular type of solicitation, we decided to limit the target only to those who had given several gifts less than $1,000, but jumped to $1,000+ to join Partners. By the end, we had shrunk our target size, but had succeeded in better isolating the behavior we were trying to predict.
In the end, murkiness is unavoidable, but the idea is to have a target variable look as much as possible like the future population you will be scoring. So, as you tend to your unruly independents, don’t forget about that seemingly well-behaved fellow, the dependent variable, because he is actually the leader of the bunch!
Kate Chamberlin leads a small analytics group at Memorial Sloan-Kettering Cancer Center. She came to Sloan- Kettering in fall 2006 from Columbia University, where she was a research analyst and writer for the university’s corporate and foundation relations office. Kate has also served as an events manager for Columbia’s principal gifts group, and a grant writer at Arts Horizons, a small arts education agency. She holds a bachelor’s degree in theater directing and design from Dartmouth College, and an MBA focusing on economics and strategy from Columbia Business School.
Michelle Paladino is part of the Memorial Sloan-Kettering Cancer Center’s growing analytics group. She develops predictive models and applies other advanced techniques to analyze donor behavior and measure program performance. Previously, Michelle was one of the one of the Center’s fundraising officers. She holds a bachelor’s degree in political science and a master’s degree in public policy from New York University.