# CoolData blog

## Guest post by John Sammis and Peter B. Wylie

Thanks to all of you who read and commented on our recent paper comparing logistic regression with multiple regression. We were not sure how popular this topic would be, but Kevin told us that interest was high, and there were a number of comments and questions. There were several general themes in the comments; Kevin has done an excellent job responding, but we thought we should throw in our two cents.

Why not just use logistic?

The point of our paper was not to suggest that logistic regression should not be used — our point was that multiple regression can achieve prediction results quite similar to logistic regression. Based on our experience working with and training fundraising professionals getting introduced to analytics, logistic regression can be intimidating. Our goal is always to get these folks to use analytics to help with their fundraising initiatives. We find many of them catch on with multiple regression, and much less so with logistic regression.

Predicted values vs. probabilities

We understand that the predicted values generated by multiple regression are different from the probabilities generated by logistic regression. Regardless of the statistic modeling technique we use, we always bin the raw prediction or probability values into equal-sized score levels. We have found that score level bins are easier to use than raw values. And using equal-sized score levels allows for easier evaluation of the scoring model.

“I cannot agree”

Some commenters, knowledgeable about statistics, said they would not use multiple regression when the inputs called for logistic. According to the rules, if the target variable is binary, then linear modelling doesn’t make sense — and the rules must be obeyed. In our view, this rigid approach to method selection is inappropriate for predictive modelling. The use of multiple linear regression in place of logistic regression may not always make theoretical sense, but predictive modellers are concerned with whether or not a model produces an output that is useful in practical terms. The worth of a model is testable against new, real-world data, therefore a model has only one criterion for determining “appropriate” use: Whether it really predicts what the modeler claims it will predict. The truth is revealed during evaluation.

A modest proposal

No one reading this should simply take our word that these two dissimilar methods yield similar results. Neither should anyone dismiss it out of hand without providing a critique based on real data. We would encourage anyone to try doing something on your own with data using both techniques and show us what you find. In particular, graduate students looking for a thesis or dissertation topic might consider producing something under this title: “Comparing Logistic Regression and Multiple Regression as Techniques for Predicting Major Giving.”

Heck! Peter says that if anyone were interested in doing a study like this for a thesis or dissertation, he would be willing to offer advice on how to:

1. Do a thorough literature review
2. Formulate specific research questions
3. Come up with a study design
4. Prepare a proposal that would satisfy a thesis or dissertation committee.

That’s quite an offer. How about it?

## 20 September 2012

### When less data is more, in predictive modelling

When I started doing predictive modelling, I was keenly interested in picking the best and coolest predictor variables. As my understanding deepened, I turned my attention to how to define the dependent variable in order to really get at what I was trying to predict. More recently, however, I’ve been thinking about refining or limiting the population of constituents to be scored, and how that can help the model.

What difference does it make who gets a propensity score? Up until maybe a year ago, I wasn’t too concerned. Sure, probably no 22-year-old graduate had ever entered a planned giving agreement, but I didn’t see any harm in applying a score to all our alumni, even our youngest.

Lately, I’m not so sure. Using the example of a planned gift propensity model, the problem is this: Young alumni don’t just get a score; they also influence how the model is trained. If all your current expectancies were at least 50 before they decided to make a bequest, and half your alumni are under 30 years old, then one of the major distinctions your model will make is based on age. ANY alum over 50 is going to score well, regardless of whether he or she has any affinity to the institution, simply because 100% of your target is in that age group.

The model is doing the right thing by giving higher scores to older alumni. If ages in the sample range from 21 to 100+, then age as a variable will undoubtedly contribute to a large chunk of the model’s ability to “explain” the target. But this hardly tells us anything we didn’t already know. We KNOW that alumni don’t make bequest arrangements at age 22, so why include them in the model?

It’s not just the fact that their having a score is irrelevant. I’m concerned about allowing good predictor variables to interact with ‘Age’ in a way that compromises their effectiveness. Variables are being moderated by ‘Age’, without the benefit of improving the model in a way that we get what we want out of it.

Note that we don’t have to explicitly enter ‘Age’ as a variable in the model for young alumni to influence the outcome in undesirable ways. Here’s an example, using event attendance as a predictor:

Let’s say a lot of very young alumni and some very elderly constituents attend their class reunions. The older alumni who attend reunions are probably more likely than their non-attending classmates to enter into planned giving agreements — for my institution, that is definitely the case. On the other hand, the young alumni who attend reunions are probably no more or less likely than their non-attending peers to consider planned giving — no one that age is a serious prospect. What happens to ‘event attendance’ as a predictor in which the dependent variable is ‘Current planned giving expectancy’? … Because a lot of young alumni who are not members of the target variable attended events, the attribute of being an event attendee will be associated with NOT being a planned giving expectancy. Or at the very least, it will considerably dilute the positive association between predictor and target found among older alumni.

I confirmed this recently using some partly made-up data. The data file started out as real alumni data and included age, a flag for who is a current expectancy, and a flag for ‘event attendee’. I massaged it a bit by artificially bumping up the number of alumni under the age of 50 who were coded as having attended an event, to create a scenario in which an institution’s events are equally popular with young and old alike. In a simple regression model with the entire alumni file included in the sample, ‘event attendance’ was weakly associated with being a planned giving expectancy. When I limited the sample to alumni 50 years of age and older, however, the R squared statistic doubled. (That is, event attendance was about twice as effective at explaining the target.) Conversely, when I limited the sample to under-50s, R squared was nearly zero.

True, I had to tamper with the data in order to get this result. But even had I not, there would still have been many under-50 event attendees, and their presence in the file would still have reduced the observed correlation between event attendance and planned giving propensity, to no useful end.

You probably already know that it’s best not to lump deceased constituents in with living ones, or non-alumni along with alumni, or corporations and foundations along with persons. They are completely distinct entities. But depending on what you’re trying to predict, your population can fruitfully be split along other, more subtle distinctions. Here are a few:

• For donor acquisition models, in which the target value is “newly-acquired donor”, exclude all renewed donors. You strictly want to have only newly-acquired donors and never-donors in your model. Your good prospects for conversion are the never-donors who most resemble the newly-acquired donors. Renewed donors don’t serve any purpose in such a model and will muddy the waters considerably.
• Conversely, remove never-donors from models that predict major giving and leadership-level annual giving. Those higher-level donors tend not to emerge out of thin air: They have giving histories.
• Looking at ‘Age’ again … making distinctions based on age applies to major-gift propensity models just as it does to planned giving propensity: Very young people do not make large gifts. Look at your data to find out at what age donors were when they first gave \$1,000, say. This will help inform what your cutoff should be.
• When building models specifically for Phonathon, whether donor-acquisition or contact likelihood, remove constituents who are coded Do Not Call or who do not have a valid phone number in the database, or who are unlikely to be called (international alumni, perhaps).
• Exclude international alumni from event attendance or volunteering likelihood models, if you never offer involvement opportunities outside your own country or continent.

Those are just examples. As for general principles, I think both of the following conditions must be met in order for you to gain from excluding a group of constituents from your model. By a “group” I mean any collection of individuals who share a certain trait. Choose to exclude IF:

1. Nearly 100% of constituents with the trait fall outside the target behaviour (that is, the behaviour you are trying to predict); AND,
2. Having a score for people with that trait is irrelevant (that is, their scores will not result in any action being taken with them, even if a score is very low or very high).

You would apply the “rules” like this … You’re building a model to predict who is most likely to answer the phone, for use by Phonathon, and you’re wondering what to do with a bunch of alumni who are coded Do Not Call. Well, it stands to reason that 1) people with this trait will have little or no phone contact history in the database (the target behaviour), and 2) people with this trait won’t be called, even if they have a very high contact-likelihood score. The verdict is “exclude.”

It’s not often you’ll hear me say that less (data) is more. Fewer cases in your data file will in fact tend to depress your model’s R squared. But your ultimate goal is not to maximize R squared — it’s to produce a model that does what you want. Fitting the data is a good thing, but only when you have the right data.

## 20 August 2012

### Logistic regression vs. multiple regression

Filed under: John Sammis, Model building, Peter Wylie, predictive modeling, regression, Statistics — kevinmacdonell @ 5:13 am

## by Peter Wylie, John Sammis and Kevin MacDonell

The three of us talk about this issue a lot because we encounter a number of situations in our work where we need to choose between these two techniques. Many of our late night/early morning phone/internet discussions have been gobbled up by talking about which technique seems to be better under what circumstances. More than a few times, I’ve suggested we write something up about our experience with both techniques. In the end we’ve always decided to put off doing that because … well, because we’ve thought it might put a lot of people to sleep. Disagree as we might about lots of things, we’re of one mind on the dictum: “Don’t bore people.” They have enough tedious stuff in their lives; we don’t need to add to their burden.

On the other hand, as analytics has started to sink its teeth more and more into the world of advancement, it seems there is a group of folks out there who wrestle with the same issue. And the issue seems to be this:

“If I have a binary dependent variable (e.g., major giver/ non major giver, volunteer/non-volunteer, reunion attender/non-reunion attender, etc.), which technique should I use? Logistic regression or multiple regression?”

We considered a number of ways to try to answer this question:

• We could simply assert an opinion based on our bank of experience with both techniques.
• We could show you the results of a number of data sets using both techniques and then offer our opinion.
• We could show you a way to compare both techniques using some of your own data.

We chose the third option because we think there is no better way to learn about a statistical technique than by using the technique on real data. Whenever we’ve done this sort of exploring ourselves, we’ve been humbled by how much we’ve learned.

Before we show you a way to compare the two techniques, we’ll offer some thoughts on why this question (“Should I use logistic regression or multiple regression?”) is so tough to find an answer to. If you’re anxious to move on to our comparison process, you can skip this section. But we hope you don’t.

Why This Is Not an Easy Question to Find an Answer To

We see at least two reasons why this is so:

• Multiple regression has lived in the neighborhood a long time; logistic regression is a new kid on the block.
• The articles and books we’ve read on comparisons of the two techniques are hard to understand.

Multiple regression is a longtime resident; logistic regression is a new kid on the block.

When World War II came along, there was a pressing need for rapid ways to assess the potential of young men (and some women) for the critical jobs that the military services were trying to fill. It was in this flurry of preparation that multiple regression began to see a great deal of practical application by behavioral scientists who had left their academic jobs and joined up for the duration. The theory behind multiple regression had been worked out much earlier in the century by geniuses like Ronald Fisher, Karl Pearson, and Edward Hotelling. But the method did not get much use until the war effort necessitated that use. The computational effort involved was just too forbidding.

Logistic regression is a different story. From the reading we’ve done, logistic regression got its early practical use in the world of medicine where biostatisticians were trying to predict binary outcomes like survived/did not survive, contracted disease/did not contract disease, had a coronary event/did not have a coronary event, and the like. It’s only been within the last fifteen or twenty years that logistic regression has found its way into the parlance of statisticians in the behavioral sciences.

These two paragraphs are a long way around of saying that logistic regression is (in our opinion) nowhere near as well vetted as is multiple regression by people like us in advancement who are interested in predicting behavior, especially giving behavior.

The articles and books we’ve read on comparisons of the two techniques are hard to understand.

Since I (Peter) was pushing to do this piece, John and I decided it would be my responsibility to do some searching of the more recent literature on logistic regression as it relates to the substance of this project.

To start off, I reread portions of texts I have accumulated over the years that focus on multiple regression as a general data analytic technique. Each text has a section on logistic regression. As I waded back into these sections, I asked myself: “Is what I’m reading here going to enlighten more than confuse the folks we have in mind for this piece?”  Without exception, my answer was, “Nope, just the reverse.” There was altogether too much focus on complicated equations and theory and nowhere near enough emphasis on the practical use of logistic regression. (This, in spite of the fact that each text had an introduction ensuring us the book would go light on math and heavy on application.)

Levity aside, it is hard to find clearly written articles or books on the use of logistic versus multiple regression in the behavioral sciences. I think it’s a bad situation that needs fixing, but that fixing won’t occur anytime soon. On the other hand, I think dad was right not to let me off easy for giving up on badly written material. And you shouldn’t let my pessimism dissuade you from trying out some of these same articles and books. (If enough of you are interested, perhaps Kevin and John and I can put together a list of suggested readings.)

A Way to Compare Logistic Regression with Multiple Regression

As promised we’ll take you through a set of steps you can use with some of your own data:

1. Pick a binary dependent variable and a set of predictors.
2. Compute a predicted probability value for every record in your sample using both multiple regression and logistic regression.
3. Draw three random subsamples of 20 records each from the total sample so that each subsample includes the predicted multiple regression probability value and the predicted logistic regression probability value for every record.
4. Display each subsample of these records in a table and a graph.
5. Do an eyeball comparison of the probability values in both the tables and the graphs.

1. Pick a binary dependent variable and a set of predictors.

For this example, we used a private four year institution with about 13,000 solicitable alums. Here are the variables we chose:

Dependent variable. Each alum who had given \$31 or more lifetime was defined as 1, all others who had given less than that amount were defined as 0. There were 6,293 0’s and 6,204 1’s. Just about an even fifty/fifty split.

Predictor variables:

• CLASS YEAR
• SQUARE OF CLASS YEAR
• EMAIL ADDRESS LISTED (YES/NO, 1=YES, 0=NO)
• MARITAL STATUS (SINGLE =1, ALL OTHERS=0)
• HOME PHONE LISTED (YES/NO, 1=YES, 0=NO)
• UNIQUE ID NUMBER

Why did we use ID number as one of the predictors? Over the years we’ve found that many schools use all-numeric ID numbers. When these numbers are entered into a regression analysis, they often work as predictors. More importantly, they help to create very granular predicted scores that can easily be binned into equal size groups.

2. Compute a predicted probability value for every record in your sample using both multiple regression and logistic regression.

This is where things start to get a bit technical and where a little background reading on both multiple regression and logistic regression wouldn’t hurt. Again, most of the material you’ll find will be tough to decipher. Here we’ll keep it as simple as we can.

For both techniques the predicted value you want to generate is a probability, a number that varies between 0 and 1.  In this example, that value will represent the probability that a record has given \$31 or more lifetime to the college.

Now here’s the rub, the logistic regression model will always generate a probability value that varies between 0 and 1. However, the multiple regression model will almost always generate a value that varies between something less than 0 (a negative number) and a number greater than 1. In fact, in this example the range of probability values for the logistic regression model extends from .037 to .948. The range of probability values for the multiple regression model extends from -.122 to 1.003.

(By the way, this is why so many statisticians advise the use of logistic regression over multiple regression when the dependent variable is binary. In essence they are saying, “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” To be fair, we’re exaggerating a bit, but not very much.)

3. Draw three random subsamples of 20 records each from the total sample so that each subsample includes the predicted multiple regression probability value and the predicted logistic regression probability value for all 20 records.

The size and number of these subsamples is, of course, arbitrary. We decided that three subsamples were better than two and that four or more would be overkill. Twenty records, as you’ll see a bit further on, is a number that allows you to see patterns in a table or graph without overcrowding the picture.

4. Display each subsample of these records in a table and a graph.

Tables 1-3 and Figures 1-3 below show how we took this step for our example. To make sure we’re being clear, let’s go through some of the details in Table 1 and Figure 1 (which we constructed for the first subsample of twenty randomly drawn records).

In Table 1 the probability values for multiple regression for each record are displayed in the left-hand column. The corresponding probability values for the same records for logistic regression are displayed in the right-hand column. For example, the multiple regression probability for the first record is .078827109. The record’s logistic regression probability is .098107437. In plain English, that means the multiple regression model for this example is saying that this particular alum has about eight chances in a hundred of giving \$31 or more lifetime. The logistic regression model is saying that the same alum has about ten chances in a hundred of giving \$31 or more lifetime.

Table 1: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the First of Three Randomly Drawn Subsamples of 20 Records

Figure 1 shows the pairs of values you see in Table 1 displayed graphically in a scatterplot. You’ll notice that the points in the scatterplot appear to fall along what roughly looks like a straight line. This means that the multiple regression model and the logistic regression model are assigning very similar probabilities to each of the 20 records in the subsample. If you study Table 1, you can see this trend, but the trend is much easier to discern in the scatter plot.

Table 2: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the Second of Three Randomly Drawn Subsamples of 20 Records

Table 3: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the Third of Three Randomly Drawn Subsamples of 20 Records

5. Do an eyeball comparison of the probability values in both the tables and the graphs.

We’ve already done such a comparison in Table 1 and Figure 1. If we do the same comparison for Tables 2 and 3 and for Figures 2 and 3, it’s pretty clear that we’ll come to the same conclusion: Multiple regression and logistic regression (for this example) are giving us very similar answers.

So Where Does This All Take Us?

We’d like to cover several topics in this closing section:

• A frequent objection to using multiple regression versus logistic regression when the dependent variable is binary
• Trying our approach on your own
• The conclusion we think you’ll eventually arrive at
• How we’ve just scratched the surface here

A frequent objection to using multiple regression versus logistic regression when the dependent variable is binary

Earlier we said that many statisticians seem to advise the use of logistic regression over multiple regression by invoking this logic: “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” We also said we were exaggerating the stance of these statisticians a bit (but not very much).

While we can understand this argument, our feeling is that, in the applied fields we toil in, that argument is not a very practical one. In fact a seasoned statistics professor we know says (in effect): “What’s the big deal? If multiple regression yields any predicted values less than 0, consider them 0. If multiple regression yields any values greater than 1, consider them 1. End of story.” We agree.

Trying our approach on your own

In this piece we’ve shown the results of one comparison between multiple and logistic regression on one set of data. It’s clear that the results we got for the two techniques were very similar. But does that mean we’d get such similar results with other examples? Not necessarily.

So here’s what we’d recommend. Try doing your own comparisons of the two techniques with:

• Different data sets. If you’re a higher education institution, you might pick a couple of data sets, one for alums who’ve been out for more than 25 years and one for folks who’ve been out less than 10 years. If you’re a non-profit, you can use a set of members from the west coast and one from the east coast.
• Different variables. Try different binary dependent variables like those we mentioned earlier: major giver/non major giver, volunteer/non-volunteer, reunion attender/non-reunion attender, etc. And try different predictors. Try to mix categorical variables like marital status with quantitative variables like age. If you’re comfortable with more sophisticated stats, try throwing in cross products and exponential terms.
• Different splits in the dependent variable. In our example piece the dependent variable was almost an exact 50/50 split. Since the underlying variable we used was quantitative (lifetime giving), we could have adjusted those splits in a number of ways: 60/40, 75/25, 80/20, 95/5, and on and on the list could go. Had we tried these different kinds of splits, would we have the same kinds of results for the two techniques? Since we actually did look at different splits like these, we can report that the results for both techniques were pretty much the same. But that’s for this example. That could change with a different data set and different variables.

The conclusion we think you’ll eventually arrive at

We’re very serious about having you compare multiple regression and logistic regression on a variety of data sets with a variety of variables and with different splits in the dependent variable. If you do, you’ll learn a ton. Guaranteed.

On the other hand, if we put ourselves in your shoes, it’s easy to imagine your saying, “Come on guys. I’m not gonna do that. Just tell me what you think about which technique is better when the dependent variable is binary. Pick a winner.”

Given our experience, we can’t pick a winner. In fact, if pushed, we’re inclined to opt in favor of multiple regression for a couple of reasons. It not only seems to perform about as well as logistic regression, but more importantly (with the stats software we use) multiple regression is simply faster and easier to use than logistic regression. But we still use logistic regression for models with dependent variables. And we continue to compare its efficacy against multiple regression when we can. And we rarely see a meaningful difference between the results.

Why do we still use both modeling techniques? Because we think taking a hard and fast stance when you’re doing applied science is not a good idea. Too easy to end up with egg on your face. Our best advice is to use whichever method is most familiar and readily available to you.

As always, we welcome your comments and reactions. Maybe even more so with this one.

## 12 July 2012

### Evaluate models with fresh data using Tableau heat maps

When I build predictive models, I normally don’t build just one for each purpose. Presumably the model is going to be used, so I want it to be the best one possible. Yes, I test the model scores against a holdout data sample, but if I built only one model, I wouldn’t have anything solid on which to base my evaluation of the results. I might reject a lone model if it truly failed against the validation set, but that has never happened to me — even a lackluster performance can be better than nothing, and therefore the model is flawed, but useful. That statement is true of models in general. So testing results with nothing to compare against is pointless.

I usually produce one multiple linear regression model and one binary logistic regression model using the stats software package Data Desk. Many permutations are possible, though: The sample to be scored can be limited in various ways, and the dependent variable can be formulated any number of ways. The choice of technique (for me, one type of regression or another) is usually determined by the nature of the DV (though not always). Given unlimited time, I would produce multiple models, but doing two at a time is manageable and keeps the task of comparison simple. The model that does the best classifying the members of the holdout sample wins the prize, and the loser is discarded.

But there’s a problem. I’ve never had a model bomb when it comes to scoring the validation set, but I HAVE had models fail after deployment. Data that is held out for validation of the model is one thing — the real world outside the model can be a whole OTHER thing. Logically it should not be so: If the model doesn’t “know” anything about the holdout data, then you’d think its performance on it would indicate how it will perform in the future.

Not so. At least, not always.

I am not so quick, then, to discard the loser. I like to evaluate both models on fresh data as it comes in (new gifts, for example). The loser might be the better choice overall, or it might turn out that a combination of the two models performs better than one on its own. Maybe one model works better for a subset of the population (young alumni, say), which suggests that adding interaction terms or even using a multiple-model approach is something to consider in the future. If the models predict slightly different propensities (as a result of how the DVs were formulated), with both of them contributors to a desirable result, then it might be worthwhile keeping both score sets by multiplying them together.

I don’t have an extended period of time for such testing — the model needs to be put into operation before it gets stale. Unfortunately, evaluation has always been a cumbersome process. I need to query the database for fresh results (conversions, upgrades, new planned giving expectancies — whatever) and then match it up by ID and score for each model (scores for untested models are not going to be in the database, obviously), and then produce some charts in Excel to visualize and compare results. It’s not a ton of work, but it takes just long enough to prevent me from doing it more than once before it’s time to commit. Even if I am evaluating the models after the fact, in order to learn for the next iteration of model-building, it’s not an exercise I will want to carry out repeatedly.

There is a better way. Think reports.

What does a report do? A report pulls real-time (or nightly-refreshed) data and assembles it in an interpretable way in a tabular or visual display. It performs this service on a regular or semi-regular basis, or on-demand. (Yeah, okay, maybe I should have said an ideal report). If part of your job consists in report preparation as well as predictive modeling, then you should be building model scores into your reports.

Here’s a tutorial on how to use Tableau to easily create a report that compares the performance of two sets of model scores in a single visualization called a heat map. This visualization can be refreshed with live data as often as desired. If you want, you can add other fields (age, sex, degree, donor status, etc.) and easily filter the data to see how model performance differs depending on the composition of the population. Note that this is probably not a report you’ll be sharing with your vice president. It does look cool, but it is mainly a diagnostic and exploration tool for your own use. The small initial investment of time is worth it if you build multiple models — it can be reused again and again.

This tutorial assumes you’re already somewhat familiar with the basics of Tableau. If you don’t have the software, and you don’t want to download a free trial, stick around anyway — other software packages offer ways to create heat maps, and the basic idea is the same.

In this example, I am comparing percentile scores from two models I developed to predict which alumni are most likely to give at least \$1,000 in the current fiscal year. One is a multiple linear regression model with a dependent variable defined as the sum of giving for the past five years (log-transformed). The other is a logistic regression model with a binary dependent variable defined as ‘has giving of at least \$1,000 in any one of the past three years’. The exact definitions of the DVs are reasonable but somewhat arbitrary. They are closely related, but different. The techniques and the predictor variables are also different, so we should expect the models to yield different results. Tested against the validation set (which was the same for both models), the logistic model proved superior. But only a test on new gift data will be truly convincing.

I want to take the entire population of alumni whom I have scored (a sample of about 27,000 individuals), and match them up with what they have given since the model was created. In this made-up example, let’s suppose I created my models last August, and I want to see what those 27,000 alumni have given since the day I completed the work. In reality, I would have chosen a winning model months ago and this would be an after-the-fact analysis, but I am doing this in order to enrich the visualization for the purposes of this example. (Cheating, in other words.)

Tableau allows you to combine data from multiple sources. In this case, you will connect to an Excel file to get your model scores (since they’re not in the database), and then connect to your database for giving results since September 1. If you do not connect directly to your database from Tableau, then you can paste your gifts data into a second sheet in your Excel workbook and extract the data via a single connection to that file — no problem. The first worksheet will have three columns: One for unique ID, and one each for the scores from the two models. In this example, the scores were output from Data Desk as percentiles. If you want, you can add columns for key attributes such as age, sex and so on. The second worksheet (or the custom SQL that retrieves data directly from your data warehouse) will provide ID and sum of giving since September 1.

Normally in report creation, Tableau handles all the aggregation of the data — the input is raw transaction data, with each ID potentially appearing on multiple rows. In this example, however, we have aggregated the data already (summing giving by ID), and there is only one row of data for each ID. It doesn’t matter, but it might have implications for some of the specific steps that follow.

You should refer to your Tableau references for connecting to data sources. All I will add is that when you add the table (or worksheet) that contains the giving data, be sure to left-join on ID, because obviously not everyone you have scored has given since Sept. 1. From here on in, I will use Tableau terminology that won’t make any sense if you don’t know the software (specifically, Tableau Desktop version 7.0). Let’s build our first view:

1. If your data has been extracted correctly, ‘ID’ will be listed under Dimensions, and your two model score sets will be listed under Measures. In this example, I will from now on refer to them as MLR (for Multiple Linear Regression) and Logistic. Obviously I’m referring to my own data — just try to replicate what I’m talking about using your own data file.
3. Right-click on Logistic and select “Create bins …” This will bin the percentile score into whatever size we desire. Change the default bin size to 5 and click OK. Note that a new variable is created in the Dimensions pane, because bins are categorical, not numerical.
4. Right-click on MLR and do the same thing.
5. Drag Logistic (bins) to the Columns shelf. Drag MLR (bins) to the Rows shelf.
6. Drag ID to the Text shelf. Click on the down-arrow of the ID pill you’ve just created, and select Measure –> Count. This will create a count of all IDs that fall into each cell. It turns green to indicate it’s now a measure instead of a dimension. (Because each ID appears in our data only once, it doesn’t matter whether we use either Count or Count Distinct.)
7. Change the Marks type from Automatic to Square (right above the Text shelf). Notice that the Text shelf suddenly turns into a Label shelf — each square of the heat map will be labeled with the number of IDs.
8. Drag ID from the Dimensions pane again, and this time drop it onto the Color shelf.
9. Click on the down-arrow of the ID pill you’ve just created, and select Measure –> Count. This will base the color or shading of the cell on the number of IDs that fall into that cell.

The top left corner of your screen will look like this:

Now we’re ready to allow the view to automatically update. The result won’t look much like a heat map: Probably just a bunch of little squares with numbers beside them. We need to enlarge the squares. Under the Size shelf is a slider: Move this to the centre of the size range. Then drag one of the rows in the view to make it taller — hover over the axis for MLR (on the far left) until the pointer turns into an up-and-down arrow, then click and drag. When you let go, the squares will resize and the alleys of white space should start to close up. Keep messing with it until the squares touch on all sides. With a little formatting of labels for readability, the final product will look something like this. (Click on thumbnail image for full size.)

A heat map can convey a lot of information at a glance. You can immediately see where a lot of individuals are concentrated: They’re in the darkest squares. The numbers are hard to read, but up in the top left of the map, we see that the number of people who fall into the 0-4 bin in both the MLR and Logistic models is 572. In the lower right area of the map, we see that 563 people fell into the 95 to 99 bin in both models. Notice that Tableau didn’t bin evenly: Every single bin has 5 score levels in it except for the bin labeled 100, which contains only individuals with a score of 100. In the map, we see that 147 people scored exactly 100 in both models. This can be corrected (using a calculated field instead of automatic binning), but I have decided to leave it the way it is. Due to the nature of this modeling exercise, I am mainly interested in the top few percentile scores anyway, and the 100 group is of particular interest. Having them mapped separately from the rest is not a problem.

The names of the bins don’t reflect what they include. For example, “90” really means “90 to 94”. You can rename them using aliases. Right-click on Logistic in the Dimensions pane, select Field Properties –> Aliases…, and change the displayed values in the Values column. Do the same for MLR.

We haven’t looked at the recent-gift data yet, but before we move on, what can we learn from this view? It appears the models agree on the individuals with extremely high or extremely low scores. In the middle range, there is still a lot of agreement but also many more cases of divergence, in which an individuals scores high in one model but low in the other. This is clear, at-a-glance evidence that our models are similar but different. Depending on the application, choosing one model over the other could have a big effect on the result, for better or worse. In this particular application, where I am interested mainly in very high-scoring alumni only, it may not make that much difference at all … but let’s not jump to that conclusion just yet.

If your data set included some key grouping information such as age or sex, it might be interesting to create a filter to examine whether the models differ on those factors. Here’s an example with ‘Age’:

1. Drag Age from the Measures pane into the Filters shelf.
2. When Tableau asks you how you want to filter on Age, select “All Values” and click Next.
3. On the next box, select Range of Values, and click OK.
4. Hover over the green Age pill on the filters shelf, click the down-arrow on the right end of the pill, and select Show Quick Filter.

Now you can set the upper and lower age bounds of the individuals you want to be counted in the heat map. As you slide the scale, it will display Age with numbers after the decimal, even though your values are all whole numbers. If this bothers you, right-click on Age in the Measures pane, select Field Properties –> Number Format…, and click on Number (Custom). Adjust the number of decimal places to zero. Here’s what the quick filter looks like:

The next two images show the heat map for different age ranges. The first one is ages 20 to 50, the second is 51 to 80. Again, click on the thumbnails for full-size images — although the beauty of a heat map is that you can see the pattern from a distance.

Right off the bat, it’s evident that it’s harder for younger individuals to get a high score, but they fare better in the MLR model than they do in the Logistic model. Imagine a 45-degree line sloping from the top left corner to the bottom right corner — the presence of more dark-shaded squares under that line indicates individuals with higher MLR scores than Logistic scores. The logistic model, on the other hand, slightly favours older alumni. This alone might explain why the Logistic model outperformed the MLR model in terms of the validation set. The difference might be due to how age-related variables were introduced to each model as predictors; they may have been more influential in one than the other. It’s hard to say without going back to the models themselves for a close look.

One can spend a lot of time playing and learning with these filters. Let’s fast-forward and (finally) introduce recent-gift data — the giving that all scored individuals have engaged in since September 1, the day after the models were supposedly created. This data appears in the Measures pane as a variable I’ll call ‘Sum of Giving’. I’m specifically interested in who has given at least \$1,000 (cumulatively), so I will need to create a calculated field to flag these people.

1. Right-click on Sum of Giving and select Create Calculated Field…
2. Give the field a name. I called it “Leadership donor”.
3. The field Sum of Giving is already in the expression window. Now you just need to add some text around it to complete the expression:
4. Click OK. This creates a field (variable) with the value 1 for any donor who has given at the Leadership level, and nothing if otherwise. Note that you can enter any amount in place of 999. If you want to count donors vs. non-donors, enter “>0”.
5. The field appears in the Measures pane, because Tableau recognizes it as numeric. We’re using it as a categorical variable, so let’s convert it into a Dimension instead. Right-click on the field name and select “Convert to Dimension”, or simply drag the field into the Dimensions pane — both actions accomplish the same thing.

Now we have a flag we can use to zero in on our higher-end donors. Let’s create a new view for that. At the lower left of your screen, right-click on the tab for the existing view and select “Duplicate Sheet”. This will allow us to continue exploring the heat map without changing our original version. We could, of course, do all our work in a single view and use filters to dynamically alter the view — that’s one of the strengths of Tableau — but for now let’s keep our views separate.

1. If you still have filters applied for Age or other variables, click on the quick filter menu and select “Clear Filter”. You can reapply it later if you want — we’re just getting it out of the way so we can see the full picture.
2. Drag ‘Leadership donor’ to the Filters shelf.
3. In the box that pops up, click “Select from list” on the General tab (it should already be selected), and then check the little box for ‘1’.
4. Click OK.

The result looks like this. (Click for full size.)

Our big donors are clustered nicely down in the lower right corner, where both the MLR and the Logistic model scores are very high. Some of the lower-score bins contain zero Leadership donors, and Tableau has automatically hidden those rows and columns from view. Take a couple of minutes to study the map. Follow the three darkest squares (labeled 48, 74, and 23) as they form a 45-degree line up the centre of the map. If you compare the values in the squares that are directly opposite each other over this line, you’ll notice that there are slightly more Leadership donors on the upper side of the line. Those are donors who have higher Logistic scores than MLR scores. As well, notice that the scattered cloud of donors above the line is more extensive than that below the line. These observations should lead us to believe that the Logistic model performs slightly better than the MLR model.

That conclusion is a bit hasty, though. There might be more Leadership donors on the high-Logistic/low-MLR side simply because more alumni ended up in those squares in the first place. We need to calculate the PERCENTAGE of the population of each square that went on to become a Leadership donor. That’s right, we’re going to create a third view, and calculate percentages to plug into each square.

1. Right-click on the tab for Sheet 2 and select Duplicate Sheet. (By the way, you can name these sheets whatever you want, just as in Excel.)
2. Remove the filter for Leadership donor.
3. Under Analysis in the top menu bar, select Create Calculated Field…
4. Name the new field ‘Leadership percentage’.
5. Enter this expression, which divides the number of Leadership donors by the total number of individuals.
6. Click OK. The new field appears in the Measures pane, which is fine.
7. Drag ‘Leadership percentage’ from Measures onto the Label shelf, replacing the count of ID.
8. Drag ‘Leadership percentage’ from Measures again, this time onto the Color shelf.
9. Right-click on any square in the map, and select Format…, which opens a formatting pane at the far left.
10. On the Pane tab, in the Default section, click on the down-arrow to the right of “Numbers”, and select Percentage.

The result is below. (Click for full size.) You can select any precision for your percentages — I’ve rounded to whole numbers to avoid clutter.

The darkest square is a single donor with a very high MLR score but a very low Logistic score, who just happened to give at the Leadership level. That square is of course labeled 100%, which causes the rest of the display to be toned down to a degree that makes it hard to see the patterns. This single donor might be a person to look at more carefully, but for now, let’s exclude that person from the map. Hover your pointer over the square, and select Exclude from the tooltip box. (This creates a specific filter for this individual, which you can remove anytime.) All the squares are recoloured accordingly:

Now some of the darkest squares are also based on very sparse data. You can exclude any that you wish, but I’m fine with this display for now. For one thing, we can clearly see that having a Logistic score of 95 or higher is darn significant, regardless of what a donor’s MLR score is. For example, there are four Leadership donors who scored only 65-69 in the MLR model but have Logistic scores of 95-99, which is what we want to see. (Those donors are in the square labeled 14%.)

Being able to demonstrate that one model is superior is pretty nifty. But I am especially intrigued at how easy it is to see how the models might work together to improve accuracy.

Have a look at the square containing individuals who scored 100 in both models. There were 147 such individuals, and 48 of them gave \$1,000 or greater — a whopping 32.6%. Here are a couple of facts to think about:

• Of all the individuals who scored 100 in the Logistic model, 26.7% went on to give at the Leadership level.
• Of all the individuals who scored 100 in the MLR model, 23.1% went on to give at the Leadership level.

Do you see what I’m getting at? When we combine both scores and zero in on people in the top percentile for both models, our yield of Leadership donors increases by nearly six percentage points over the best-performing model, to 32.6%.

The same boost is evident for other high-scoring cells in the heat map: The logistic model identifies some big donors that the MLR model misses, but the MLR model can enhance the accuracy of the logistic model. This is potentially useful for prospect identification in Major Giving, when we really want to be as focused as possible.

So far I’ve shown you only donor numbers. What about revenue? Our data set includes gift amounts, so let’s create a new view to visualize actual aggregate dollar totals.

1. Duplicate the last sheet you created, and remove any filters that had been applied.
2. Drag ‘Sum of Giving’ to the Label and Color shelves.
3. Format the values as currency.
4. For fun, change the color from green to red by clicking on Edit Colors in the context menu for the Sum of Giving card.

The result is pretty dramatic.

This is for all donors, not just Leadership donors, but if you want to narrow it down to Leadership donors only, re-apply your filter.

Just as with raw donor counts, the view above is a little misleading, simply because more prospects equals more donors, equals more dollars. So let’s create a calculated field to give us AVERAGE dollars per donor for every cell in the heat map.

The individuals with scores of 100 in both models gave nearly \$5,000 on average — no other cell comes close. But guess what’s even better:

• The individuals who scored 100 in the Logistic model gave an average of \$2,927.
• The individuals who scored 100 in the MLR model gave an average of \$2,971.

The models are strongest where they intersect!

I’ve spent a lot of time and more than 4,000 words explaining how to do this in Tableau. This is very unusual for me — why a specific product such as Tableau, when one can create heat maps even in Excel? *

• It’s just so easy to do it in Tableau, and the result looks attractive without requiring the user to fuss with formatting.
• The data can be refreshed whenever necessary. If you’re connecting to an Excel file, simply paste new data into the file and refresh the data extract. It’s that simple. (Remember to refresh the extract rather than replace the data source entirely, if you want to retain your aliases as you’ve defined them.)
• That goes for refreshing the giving data, AND for loading a whole different set of individuals and scores. You don’t need to rebuild these views from scratch (although it’s pretty easy to do so).
• Tableau allows you to dynamically filter the data any which way you want. It’s a great way to explore the data. In my example, it would have been really interesting to filter on donors who UPGRADED to the \$1,000+ level. Which model did a better job predicting upgrading? I don’t know, but I’m going to find out.
• You can drill down to the underlying data. If you want to see a list of the people who scored 100 in both models, just hover the pointer over that square and click on the data icon, then the ‘Underlying’ tab. Imagine having wealth/capacity scores on one axis, and propensity scores on the other …
• I’ve shared my heat maps here as static images, but you can share your analyses as fully-functioning views, even with people who don’t have the software on their computers. Save it as a Packaged Workbook, and they’ll be able to open it in Tableau Reader (which they can download for free). They can use the filters you’ve set up to play with the data themselves.

This may be the longest CoolData post ever, but as usual I feel I am barely scratching the surface.

* P.S.: Heat maps are easily created in a combination of Data Desk and Excel. Without going into too much detail: In Data Desk use contingency tables (a.k.a. cross tabs) to create the basic matrix of numbers, with one score set as x and the other as y, and use derived variable expressions to limit the counts as desired. Copy and paste the table text into Excel, and use conditional formatting to create the desired shading. Unfortunately this requires some fussing and the result is static.

## By Peter Wylie and John Sammis

For a number of years John Sammis and I have been pushing colleges and universities to examine the data they (or their vendors) collect for alums who are members of their online communities. For example, we encourage them to look at very basic things like:

• The number of e-mails an alum has opened since it’s been possible to get such data
• The number of “click throughs” an alum has made to the website in response to an e-mail, an e-newsletter, and the like
• The number of times an alum visits the website
• The date and time of each visit

Why do we think they should be recording and examining these kinds of data? Because (based on some limited but compelling evidence) we think such data are related to how much and how often alums give to their alma maters as well as how engaged they are (e.g., reunion attendance, volunteering, etc.) to these institutions.  To ignore such data means leaving money on the table and losing a chance to spot alums who are truly interested in the school, even if they’ll never become major givers.

Frankly the response to our entreaties has been less than heartening:

• “We don’t have an online community. If we get one, that’s probably a year or two away.”
• “With the explosion of social media, we’re more interested in what we can learn about our alums through Facebook, LinkedIn, Twitter … I mean those are the sites our alums will be going to, not ours.”
• “You want us to get record-by-record data from the vendor who maintains our site? Surely you jest. We’re lucky if they’ll send us decipherable summary data on email openings and click-throughs.”

But we’re nothing if not persistent. So what we’ve done here is put together some data from a four year higher education institution that has a pretty active online community. Granted, it’s only one school, but the data show a pronounced relationship between number of website visits and several different measures of alumni engagement and alumni giving.

We have to believe this school is not a glaring exception among the thousands of schools out there that have online communities. Our hope is that you’ll read through what we have to show and tell and conclude, “What the heck. Why don’t we take a similar look at our own data and see what we can see. Can’t hurt.”

Nope. Can’t hurt, and it might help – might help a lot.

A View of the Overall Distribution of Website Visits and the Distribution of Visits by Class Year

Table 1 shows that almost exactly two thirds of the alums have never visited the school’s website as an identifiable member of the school’s online community. The remaining third are roughly evenly divided among four categories: one visit; two to three visits; four to seven visits; and eight or more visits.

Table 1: Frequency and Percentage Distribution of Website Visits for More Than 40,000 Alums

As soon as we saw this distribution, we were quite sure it would vary a great deal depending how long people had been out of school. To confirm that hunch we divided all alums into ten roughly equal sized groups (i.e., into deciles).

Table 2: Count, Median Class Year, and Minimum and Maximum Class Years for All Alums Divided into Deciles

As you can see in Table 2, there are some very senior people in this alumni universe, and there are some very junior people. For example, the majority of folks in Decile 10 (CY 2006 – CY 2010) are probably in their 20’s. What about Decile 1 (CY 1926 –CY 1958)? It’s a safe bet that these folks are all over 70, and we may have at least one centenarian in the group (which we think is pretty cool).

If you look at Table 3, you can see the percentage distribution of website visits for each Class Year Decile. However, the problem with that table (and most tables that have lots of information in them) is that (unless you’re a data geek like we are) it’s not something you want to spend a lot of time looking at. You’d probably prefer to look at a chart, a graphic display of the data. So what we’ve done here (and throughout this paper) is display the data graphically for the folks in Decile 1, Decile 5, and Decile 10 – very senior people, middle-aged people, and very young people.

Table 3: Percentage of Website visits by Class Year Decile

Clearly our hunch was right. The distribution of website visits is highly related to how long people have been out of school:

• Over 90% of alums who graduated before 1959 (Decile 10) have not visited the website.
• In the youngest group (Decile 10) only a bit over 25% of alums have not visited the site.
• You have to look at Table 3 to see the trend, but notice how “the 0 Visits” percentage drops for Deciles 7-10 (a span covering alums graduating in 1992 up to 2010):  68.9% down to 64.3% down to 46.5% down to 27.7%.

The Relationship between Number of Website Visits and Alumni Engagement

If you work in higher education advancement, you probably hear the term “alumni engagement” mentioned several times a week. It’s something lots and lots of folks are concerned about. And plenty of these folks are finding more and more ways to operationally define the term.

Here we’ve taken a very simple approach. We’ve looked at whether or not an alum had ever volunteered for the institution and whether or not an alum had ever attended a reunion.

Volunteering

Table 4 and Figures 4 to 6 show the relationship between number of website visits and volunteering. Just to be clear on what we’re laying out here, let’s go through some of the details of Table 4.

We’ll use Class year Decile 1 (alums who graduated between 1926 and 1958) as an example. Look at the alums in this Decile who have never visited the website; only 17.1% of them have ever volunteered. On the other hand, 42.9% of alums who have visited the website 8 or more times have volunteered. If you look at Figure 4, of course, you’ll see the same information depicted graphically.

Table 4: Percentage of Alums by Number of Website Visits for All Deciles Who Ever Volunteered

There are two facts that stick out for us in Table 4 and Figures 4 to 6:

• Alums who have never visited the website are far less likely to have volunteered than those who have visited even once.
• In general, there is a steady climb in the rate of volunteering as the number of website visits increases.

Reunion Attendance

If you look through Table 5 and Figures 7 to 9, you’ll see a relationship between number of website visits and reunion attendance that’s very similar to what you saw between number of website visits and volunteering. The one exception would be for the youngest group of alums – those in Decile 10 who graduated between 2006 and 2010. These alums simply are too young to have attended a five year reunion. (Although it would appear that several of them found a way to make it back to school anyway – good for them.)

Table 5: Percentage of Alums by Number of Website Visits for All Deciles Who Ever Attended a Reunion

The Relationship between Number of Website Visits and Giving

There is no question that advancement offices are interested in alumni engagement. But if we’re realistic, we have to admit they tend to view engagement as mainly a step in the direction of one day becoming a donor. So let’s take a look at how number of website visits is related to alumni giving at this school.

We’ve created two sets of tables and figures to allow you to get a clear look at all this:

• Table 6 and Figures 10 to 12 show the relationship between the number of website visits and giving over the past two fiscal years.
• Table 7 and Figures 13-15 show the relationship between the number of website visits and lifetime giving of \$10,000 or more.

Browse through all this material. After you’ve done that, we’ll tell you what we see.

Table 6: Percentage of Alums by Number of Website Visits for All Deciles Who Have Given Anything in the Last Two Fiscal Years

Table 7: Percentage of Alums by Number of Website Visits for All Deciles Who Have Given \$10,000 or More Lifetime

Clearly, there is a lot of information contained in these tables and charts. But if we stand back from all that we see, the picture becomes clear. Regardless of how long alums have been out of school, those who have visited the website versus those who have not are better recent givers, and they are better major givers.

For example, let’s focus on alums who graduated before 1958 (Decile 1).  Those who have visited the website at least 8 times are almost twice as likely to have given in the last two fiscal years as those who have never visited the site (75% versus 41.6%). If we look at giving of \$10,000 or more lifetime for this same Decile, the difference is even more striking: 42.9% versus 12.5%.

Let’s jump down to Decile 10, the “youngsters” who graduated between 2006 and 2010. Understandably, almost none of these alums have given \$10,000 or more lifetime. But look at Figure 12. For this group the relationship between number of website visits and giving over the last two fiscal years is striking:

• 27.8% for those with 0 website visits gave during this period.
• 35.1% for those with 1 visit gave during this period.
• 38.1% for those with 2-3 visits gave during this period.
• 43.1% for those with 4-7 visits gave during this period.
• 50.9% for those with 8 or more visits gave during this period.

Where to Go from Here

Clearly, there is a strong relationship between this simple web metric (number of website visits) and alumni engagement and alumni giving at this particular school. If that’s the case, it’s reasonable to assume that the same sort of relationship holds true for other schools. If you agree with that assumption, then we think it’s more than worth your while to take a look at similar data at your own institution.

At this point you might decide:

“Look guys, this is all very interesting, but we simply don’t have the time, resources, nor staff to do that. Maybe sometime in the future, when things are less hectic around here, we’ll take your advice. But not now.”

As much as we love this sort of analysis, we totally get a decision like that. We may be specialists, but we talk to enough people in advancement every week to realize you have a lot more on your minds than data mining and predictive modeling.

On the other hand, you might conclude that what we’ve done here is something you’d like to try to replicate, or improve on. If so, here’s what we’d recommend:

1. Find out what kind of online data is available to you.
2. Ask your technical folks to get those data into analyzable form for you.
3. Do some simple analyses with the data.
4. Share the results with colleagues you think would find it interesting.

1. Find Out What Kind of Data Is Available

Depending on how your shop is set up, this may take some persistence and digging. If it were us, we’d be trying to find out:

• Has an alum ever opened an email that we’ve sent them? (In a lot of schools they don’t have to be a member of the online community for you to ascertain that.)
• Have they ever opened an e-newsletter?
• Have they ever clicked through to your website from an e-mail or e-newsletter?
• Can you get counts for number of openings and number of click-throughs?

In all probability, you’ll be dealing with a vendor (either directly or through your IT folks) to get answers to these questions. Expect some pushback. A dialogue that goes like this would not be unusual:

YOU: Can I get the number of e-mails and e-newsletters that each of our alums has opened since the school has been sending out that kind of stuff?

VENDOR: We can certainly give you the number of e-mails and number of e-newsletters that were opened on each date that one was sent out.

YOU: That’s great, but that’s not what I’m looking for. I need to know, on a record-by-record basis, which alums opened the e-communication, and I need a total count for each alum for their total number of openings.

VENDOR: That’ll take some doing.

YOU: But you can do it?

VENDOR: I suppose.

YOU: Terrific!

2. Ask Your Technical Folks to Get The Data Into Analyzable Form.

What does “analyzable form” mean? To us that just means getting the data into spreadsheet format (probably Excel) where the first field will probably be the unique ID number you use to keep track of all the alums (and other constituents) in your fundraising database. For starters, we’d recommend something very simple. For example:

• Field A: Unique ID number
• Field B: Total amount of lifetime hard credit (for many alums, this value will be zero)
• Field C: Total amount of hard credit for the last two fiscal years
• Field D: Total number of e-mails or e-newsletters opened
• Field E: Total number of click-throughs to your website from these e-mails and e-newsletters
• Field F: Preferred class year of the alum

In our opinion, this kind of file should be very simple to build. In our experience, however, that is often not the case. (Why? How much time you got?)

Our frustrations with this sort of problem notwithstanding, keep pushing for the file. Be polite. Be diplomatic. And, above all, be persistent.

3. Do Some Simple Analyses with The Data.

There are any number of ways to analyze your data. Our bias would be to have you import the Excel file into a stats software package, and then do the analysis. (You can do it in Excel, but it’s a lot harder than if you use something like SPSS or Data Desk [our preference]).

If you can’t do this yourself, we’d recommend that you find someone on your team or on your campus to do it for you.  The right person, when you ask them if they can roughly replicate the tables and charts included in this paper, will say something like, “Sure,” “No problem,” “Piece of cake,” etc. If they don’t, keep looking.

4. Share The Results With Colleagues You Think Would Find It Interesting.

Sharing your results with colleagues should be stimulating and enjoyable. You know the folks you work with and have probably already got some in mind. But here are a few suggestions:

• Look for people who think data driven decision making is important in higher education advancement.
• Avoid people who are inclined to tell you why something can’t be done. Include people who enjoy finding ways around obstacles.
• It’s okay to have one devil’s advocate in the group. More than one? That can be kind of frustrating.
• If you can, get a vice president to join you. Folks at that level can help move things forward more easily than people at the director level, especially when it comes to “motivating” vendors to do things for you that they’d rather not do.

When you can, let us know how things go.

## 18 April 2011

### Do predictive scores get stale?

Filed under: Best practices, Model building, predictive modeling — Tags: — kevinmacdonell @ 11:46 am

This question was posed recently on a listserv I subscribe to. Wealth screening data does become outdated, the questioner noted, but what about modeling scores? Constituent giving and age change over time, but does that matter? Maybe it depends on what model you use and the variables that go into it?

The general answer is yes, scores from a predictive model do get “stale,” although they may not become merely outdated in the same way that wealth screening data does. They may become less relevant over time. Or, rather, they may not be as effective as they might be, lacking the benefit of recent data.

Given the volume of interactions and transactions that might occur over a year (gifts, events attended, surveys responded to, etc.), it’s probably a good idea to have fresh scores every year, or two years at most. Changes in the age of constituents is not as much a factor, because all constituents age at the same rate — which is not true of changes in behaviours such as giving.

I think the questioner is right in thinking the type of model is a factor. Certain models could be more sensitive to the passage of time, such as any model trained on relatively sparse data — a major gift propensity model, for example. A year could make a difference, and that’s important if you’re needing new names to feed a major giving or planned giving pipeline. If you require scores for your most recent constituents (say, young alumni, in an Annual Fund donor-acquisition model), that too would be a reason to build a model yearly — unless new constituents are automatically scored using an algorithm developed in a previous model.

Institutions with in-house capability are at an advantage because they can build models continuously — each iteration of the model will lead to improvements (one hopes) — at an attractive cost per model. As the saying goes, every model is wrong, but some are useful. Re-scoring constituents is as much about getting it “less wrong” as it is about being up to date.

For me, summertime has always been model-building season. The latest graduates have been loaded into the alumni database, the scores will be needed for Annual Giving appeals in September, and it’s worked out that summers have been less hectic than other times of year.

In general, though, for institutions with in-house capability, there is only one sure time to create a new model: Whenever there’s a business problem that requires one.