CoolData blog

13 November 2012

Making a case for modeling

Guest post by Peter Wylie and John Sammis

(Click here to download post as a print-friendly PDF: Making a Case for Modeling – Wylie Sammis)

Before you wade too far into this piece, let’s be sure we’re talking to the right person. Here are some assumptions we’re making about you:

  • You work in higher education advancement and are interested in analytics. However, you’re not a sophisticated stats person who throws around terms like regression and cluster analysis and neural networks.
  • You’re convinced that your alumni database (we’ll leave “parents” and “friends” for a future paper) holds a great deal of information that can be used to pick out the best folks to appeal to — whether by mail, email, phone, or face-to-face visits.
  • Your boss and your boss’s bosses are, at best, less convinced than you are about this notion. At worst, they have no real grasp of what analytics (data mining and predictive modeling) are. And they may seem particularly susceptible to sales pitches from vendors offering expensive products and services for using your data – products and services you feel might cause more problems than they will solve.
  • You’d like to find a way to bring these “boss” folks around to your way of thinking, or at least move them in the “right” direction.

If we’ve made some accurate assumptions here, great. If we haven’t, we’d still like you to keep reading. But if you want to slip out the back of the seminar room, not to worry. We’ve done it ourselves more times than you can count.

Okay, here’s something you can try:

1. Divide the alums at your school into ten roughly equal size groups (deciles) by class year. Table 1 is an example from a medium sized four year college.

Table 1: Class Years and Counts for Ten Roughly Equal Size Groups (Deciles) of Alumni at School A

2. Create a very simple score:

EMAIL LISTED(1/0) + HOME PHONE LISTED(1/0)

This score can assume three values: “0, “1”, or “2.” A “0” means the alum has neither an email nor a home phone listed in the database. A “1” means the alum has either an email listed in the database or a home phone listed in the database, but not both. A “2” means the alum has both an email and a home phone listed in the database.

3. Create a table that contains the percentage of alums who have contributed at least $1,000 lifetime to your school for each score level for each class year decile. Table 1 is an example of such a table for School A.

Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A

 

4. Create a three dimensional chart that conveys the same information contained in the table. Figure 1 is an example of such a chart for School A.

In the rest of this piece we’ll be showing tables and charts from seven other very diverse schools that look quite similar to the ones you’ve just seen. At the end, we’ll step back and talk about the importance of what emerges from these charts. We’ll also offer advice on how to explain your own tables and charts to colleagues and bosses.

If you think the above table and chart are clear, go ahead and start browsing through what we’ve laid out for the other seven schools. However, if you’re not completely sure you understand the table and the chart, see if the following hypothetical questions and answers help:

Question: “Okay, I’m looking at Table 2 where it shows 53% for alums in Decile 1 who have a score of 2. Could you just clarify what that means?”

Answer. “That means that 53% of the oldest alums at the school who have both a home phone and an email listed in the database have given at least $1,000 lifetime to the school.”

Question. “Then … that means if I look to the far left in that same row where it shows 29% … that means that 29% of the oldest alums at the school who have neither a home phone nor an email listed in the database have given at least $1,000 lifetime to the school?”

Answer. “Exactly.”

Question. “So those older alums who have a score of 2 are way better givers than those older alums who have a score of 0?”

Answer. “That’s how we see it.”

Question. “I notice that in the younger deciles, regardless of the score, there are a lot of 0 percentages or very low percentages. What’s going on there?”

Answer. “Two things. One, most younger alums don’t have the wherewithal to make big gifts. They need years, sometimes many years, to get their financial legs under them. The second thing? Over the last seven years or so, we’ve looked at the lifetime giving rates of hundreds and hundreds of four-year higher education institutions. The news is not good. In many of them, well over half of the solicitable alums have never given their alma maters a penny.”

Question. “So, maybe for my school, it might be good to lower that giving amount to something like ‘has given at least $500 lifetime’ rather than $1,000 lifetime?”

Answer. Absolutely. There’s nothing sacrosanct about the thousand dollar level that we chose for this piece. You can certainly lower the amount, but you can also raise the amount. In fact, if you told us you were going to try several different amounts, we’d say, “Fantastic!”

Okay, let’s go ahead and have you browse through the rest of the tables and charts for the seven schools we mentioned earlier. Then you can compare your thoughts on what you’ve seen with what we think is going on here.

(Note: After looking at a few of the tables and charts, you may find yourself saying, “Okay, guys. Think I got the idea here.” If so, go ahead and fast forward to our comments.)

Table 3: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School B

 

Table 4: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School C

Table 5: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School D

Table 6: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School E

Table 7: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School F

Table 8: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School G

Table 9: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School H

Definitely a lot of tables and charts. Here’s what we see in them:

  • We’ve gone through the material you’ve just seen many times. Our eyes have always been drawn to the charts; we use the tables for back-up. Even though we’re data geeks, we almost always find charts more compelling than tables. That is most certainly the case here.
  • We find the patterns in the charts across the seven schools remarkably similar. (We could have included examples from scores of other schools. The patterns would have looked the same.)
  • The schools differ markedly in terms of giving levels. For example, the alums in School C are clearly quite generous in contrast to the alums in School F. (Compare Figure 3 with Figure 6.)
  • We’ve never seen an exception to one of the obvious patterns we see in these data: The longer alums have been out of school, the more money they have given to their school.
  • The “time out of school” pattern notwithstanding, we continue to be taken by the huge differences in giving levels (especially among older alums) across the levels of a very simple score. School G is a prime example. Look at Figure 7 and look at Table 8. Any way you look at these data, it’s obvious that alums who have even a score of “1” (either a home phone listed or an email listed, but not both) are far better givers than alums who have neither listed.

Now we’d like to deal with an often advanced argument against what you see here. It’s not at all uncommon for us to hear skeptics say: “Well, of course alumni on whom we have more personal information are going to be better givers. In fact we often get that information when they make a gift. You could even say that amount of giving and amount of personal information are pretty much the same thing.”

We disagree for at least two reasons:

Amount of personal information and giving in any alumni database are never the same thing. If you have doubts about our assertion, the best way to dispel those doubts is to look in your own alumni database. Create the same simple score we have for this piece. Then look at the percentage of alums for each of the three levels of the score. You will find plenty of alums who have a score of 0 who have given you something, and you will find plenty of alums with a score of 2 who have given you nothing at all.

We have yet to encounter a school where the IT folks can definitively say how an email address or a home phone number got into the database for every alum. Why is that the case? Because email addresses and home phone numbers find their way into alumni database in a variety of ways. Yes, sometimes they are provided by the alum when he or she makes a gift. But there are other ways. To name a few:

  • Alums (givers or not) can provide that information when they respond to surveys or requests for information to update directories.
  • There are forms that alums fill out when they attend a school sponsored event that ask for this kind of information.
  • There are vendors who supply this kind of information.

Now here’s the kicker. Your reactions to everything you’ve seen in this piece are critical. If you’re going to go to a skeptical boss to try to make a case for scouring your alumni database for new candidates for major giving, we think you need to have several reactions to what we’ve laid out here:

1. “WOW!” Not, “Oh, that’s interesting.” It’s gotta be, “WOW!” Trust us on this one.

2. You have to be champing at the bit to create the same kinds of tables and charts that you’ve seen here for your own data.

3. You have to look at Table 2 (that we’ve recreated below) and imagine it represents your own data.

Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A

Then you have to start saying things like:

“Okay, I’m looking at the third class year decile. These are alums who graduated between 1977 and 1983. Twenty-five percent of them with a score of 2 have given us at least $1,000 lifetime. But what about the 75% who haven’t yet reached that level? Aren’t they going to be much better bets for bigger giving than the 94% of those with a score of 0 who haven’t yet reached the $1,000 level?”

“A score that goes from 0 to 2? Really? What about a much more sophisticated score that’s based on lots more information than just email listed and home phone listed? Wouldn’t it make sense to build a score like that and look at the giving levels for that more sophisticated score across the class year deciles?”

If your reactions have been similar to the ones we’ve just presented, you’re probably getting very close to trying to making your case to the higher-ups. Of course, how you make that case will depend on who you’ll be talking to, who you are, and situational factors that you’re aware of and we’re not. But here are a few general suggestions:

Your first step should be making up the charts and figures for your own data. Maybe you have the skills to do this on your own. If not, find a technical person to do it for you. In addition to having the right skills, this person should think doing it would be cool and won’t take forever to finish it.

Choose the right person to show our stuff and your stuff to. More and more we’re hearing people in advancement say, “We just got a new VP who really believes in analytics. We think she may be really receptive to this kind of approach.” Obviously, that’s the kind of person you want to approach. If you have a stodgy boss in between you and that VP, find a way around your boss. There’s lots of ways to do that.

Do what mystery writers do; use the weapon of surprise. Whoever the boss you go to is, we’d recommend that you show them this piece first. After you know they’ve read it, ask them what they thought of it. If they say anything remotely similar to: “I wonder what our data looks like,” you say, “Funny you should ask.”

Whatever your reactions to this piece have been, we’d love to hear them.

Advertisements

10 October 2012

Logistic vs. multiple regression: Our response to comments

Guest post by John Sammis and Peter B. Wylie

Thanks to all of you who read and commented on our recent paper comparing logistic regression with multiple regression. We were not sure how popular this topic would be, but Kevin told us that interest was high, and there were a number of comments and questions. There were several general themes in the comments; Kevin has done an excellent job responding, but we thought we should throw in our two cents.

Why not just use logistic?

The point of our paper was not to suggest that logistic regression should not be used — our point was that multiple regression can achieve prediction results quite similar to logistic regression. Based on our experience working with and training fundraising professionals getting introduced to analytics, logistic regression can be intimidating. Our goal is always to get these folks to use analytics to help with their fundraising initiatives. We find many of them catch on with multiple regression, and much less so with logistic regression.

Predicted values vs. probabilities

We understand that the predicted values generated by multiple regression are different from the probabilities generated by logistic regression. Regardless of the statistic modeling technique we use, we always bin the raw prediction or probability values into equal-sized score levels. We have found that score level bins are easier to use than raw values. And using equal-sized score levels allows for easier evaluation of the scoring model.

“I cannot agree”

Some commenters, knowledgeable about statistics, said they would not use multiple regression when the inputs called for logistic. According to the rules, if the target variable is binary, then linear modelling doesn’t make sense — and the rules must be obeyed. In our view, this rigid approach to method selection is inappropriate for predictive modelling. The use of multiple linear regression in place of logistic regression may not always make theoretical sense, but predictive modellers are concerned with whether or not a model produces an output that is useful in practical terms. The worth of a model is testable against new, real-world data, therefore a model has only one criterion for determining “appropriate” use: Whether it really predicts what the modeler claims it will predict. The truth is revealed during evaluation.

A modest proposal

No one reading this should simply take our word that these two dissimilar methods yield similar results. Neither should anyone dismiss it out of hand without providing a critique based on real data. We would encourage anyone to try doing something on your own with data using both techniques and show us what you find. In particular, graduate students looking for a thesis or dissertation topic might consider producing something under this title: “Comparing Logistic Regression and Multiple Regression as Techniques for Predicting Major Giving.”

Heck! Peter says that if anyone were interested in doing a study like this for a thesis or dissertation, he would be willing to offer advice on how to:

  1. Do a thorough literature review
  2. Formulate specific research questions
  3. Come up with a study design
  4. Prepare a proposal that would satisfy a thesis or dissertation committee.

That’s quite an offer. How about it?

20 August 2012

Logistic regression vs. multiple regression

Filed under: John Sammis, Model building, Peter Wylie, predictive modeling, regression, Statistics — kevinmacdonell @ 5:13 am

by Peter Wylie, John Sammis and Kevin MacDonell

(Click to download printer-friendly PDF: Logistic vs MR-Wylie Sammis MacDonell)

The three of us talk about this issue a lot because we encounter a number of situations in our work where we need to choose between these two techniques. Many of our late night/early morning phone/internet discussions have been gobbled up by talking about which technique seems to be better under what circumstances. More than a few times, I’ve suggested we write something up about our experience with both techniques. In the end we’ve always decided to put off doing that because … well, because we’ve thought it might put a lot of people to sleep. Disagree as we might about lots of things, we’re of one mind on the dictum: “Don’t bore people.” They have enough tedious stuff in their lives; we don’t need to add to their burden.

On the other hand, as analytics has started to sink its teeth more and more into the world of advancement, it seems there is a group of folks out there who wrestle with the same issue. And the issue seems to be this:

“If I have a binary dependent variable (e.g., major giver/ non major giver, volunteer/non-volunteer, reunion attender/non-reunion attender, etc.), which technique should I use? Logistic regression or multiple regression?”

We considered a number of ways to try to answer this question:

  • We could simply assert an opinion based on our bank of experience with both techniques.
  • We could show you the results of a number of data sets using both techniques and then offer our opinion.
  • We could show you a way to compare both techniques using some of your own data.

We chose the third option because we think there is no better way to learn about a statistical technique than by using the technique on real data. Whenever we’ve done this sort of exploring ourselves, we’ve been humbled by how much we’ve learned.

Before we show you a way to compare the two techniques, we’ll offer some thoughts on why this question (“Should I use logistic regression or multiple regression?”) is so tough to find an answer to. If you’re anxious to move on to our comparison process, you can skip this section. But we hope you don’t.

Why This Is Not an Easy Question to Find an Answer To

We see at least two reasons why this is so:

  • Multiple regression has lived in the neighborhood a long time; logistic regression is a new kid on the block.
  • The articles and books we’ve read on comparisons of the two techniques are hard to understand.

Multiple regression is a longtime resident; logistic regression is a new kid on the block.

When World War II came along, there was a pressing need for rapid ways to assess the potential of young men (and some women) for the critical jobs that the military services were trying to fill. It was in this flurry of preparation that multiple regression began to see a great deal of practical application by behavioral scientists who had left their academic jobs and joined up for the duration. The theory behind multiple regression had been worked out much earlier in the century by geniuses like Ronald Fisher, Karl Pearson, and Edward Hotelling. But the method did not get much use until the war effort necessitated that use. The computational effort involved was just too forbidding.

Logistic regression is a different story. From the reading we’ve done, logistic regression got its early practical use in the world of medicine where biostatisticians were trying to predict binary outcomes like survived/did not survive, contracted disease/did not contract disease, had a coronary event/did not have a coronary event, and the like. It’s only been within the last fifteen or twenty years that logistic regression has found its way into the parlance of statisticians in the behavioral sciences.

These two paragraphs are a long way around of saying that logistic regression is (in our opinion) nowhere near as well vetted as is multiple regression by people like us in advancement who are interested in predicting behavior, especially giving behavior.

The articles and books we’ve read on comparisons of the two techniques are hard to understand.

Since I (Peter) was pushing to do this piece, John and I decided it would be my responsibility to do some searching of the more recent literature on logistic regression as it relates to the substance of this project.

To start off, I reread portions of texts I have accumulated over the years that focus on multiple regression as a general data analytic technique. Each text has a section on logistic regression. As I waded back into these sections, I asked myself: “Is what I’m reading here going to enlighten more than confuse the folks we have in mind for this piece?”  Without exception, my answer was, “Nope, just the reverse.” There was altogether too much focus on complicated equations and theory and nowhere near enough emphasis on the practical use of logistic regression. (This, in spite of the fact that each text had an introduction ensuring us the book would go light on math and heavy on application.)

Then, using my trusty iPad, I set about seeing what I could find on the web. Not surprisingly, I found a ton of articles (and even some full length books) that had found their way into the public domain. I downloaded a bunch of them to read whenever I could find enough time to dig into them. I’m sorry to report that each time I’d give one of these things a try, I would hear my father’s voice (dad graduated third in his class in engineering school) as he paged through my own science and math texts when I was in college: “They oughta teach the clowns who wrote these things to write in plain English.” (I always tried to use such comments as excuses for bad grades. Never worked.)

Levity aside, it is hard to find clearly written articles or books on the use of logistic versus multiple regression in the behavioral sciences. I think it’s a bad situation that needs fixing, but that fixing won’t occur anytime soon. On the other hand, I think dad was right not to let me off easy for giving up on badly written material. And you shouldn’t let my pessimism dissuade you from trying out some of these same articles and books. (If enough of you are interested, perhaps Kevin and John and I can put together a list of suggested readings.)

A Way to Compare Logistic Regression with Multiple Regression

As promised we’ll take you through a set of steps you can use with some of your own data:

  1. Pick a binary dependent variable and a set of predictors.
  2. Compute a predicted probability value for every record in your sample using both multiple regression and logistic regression.
  3. Draw three random subsamples of 20 records each from the total sample so that each subsample includes the predicted multiple regression probability value and the predicted logistic regression probability value for every record.
  4. Display each subsample of these records in a table and a graph.
  5. Do an eyeball comparison of the probability values in both the tables and the graphs.

1. Pick a binary dependent variable and a set of predictors.

For this example, we used a private four year institution with about 13,000 solicitable alums. Here are the variables we chose:

Dependent variable. Each alum who had given $31 or more lifetime was defined as 1, all others who had given less than that amount were defined as 0. There were 6,293 0’s and 6,204 1’s. Just about an even fifty/fifty split.

Predictor variables:

  • CLASS YEAR
  • SQUARE OF CLASS YEAR
  • EMAIL ADDRESS LISTED (YES/NO, 1=YES, 0=NO)
  • MARITAL STATUS (SINGLE =1, ALL OTHERS=0)
  • HOME PHONE LISTED (YES/NO, 1=YES, 0=NO)
  • UNIQUE ID NUMBER

Why did we use ID number as one of the predictors? Over the years we’ve found that many schools use all-numeric ID numbers. When these numbers are entered into a regression analysis, they often work as predictors. More importantly, they help to create very granular predicted scores that can easily be binned into equal size groups.

2. Compute a predicted probability value for every record in your sample using both multiple regression and logistic regression.

This is where things start to get a bit technical and where a little background reading on both multiple regression and logistic regression wouldn’t hurt. Again, most of the material you’ll find will be tough to decipher. Here we’ll keep it as simple as we can.

For both techniques the predicted value you want to generate is a probability, a number that varies between 0 and 1.  In this example, that value will represent the probability that a record has given $31 or more lifetime to the college.

Now here’s the rub, the logistic regression model will always generate a probability value that varies between 0 and 1. However, the multiple regression model will almost always generate a value that varies between something less than 0 (a negative number) and a number greater than 1. In fact, in this example the range of probability values for the logistic regression model extends from .037 to .948. The range of probability values for the multiple regression model extends from -.122 to 1.003.

(By the way, this is why so many statisticians advise the use of logistic regression over multiple regression when the dependent variable is binary. In essence they are saying, “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” To be fair, we’re exaggerating a bit, but not very much.)

3. Draw three random subsamples of 20 records each from the total sample so that each subsample includes the predicted multiple regression probability value and the predicted logistic regression probability value for all 20 records.

The size and number of these subsamples is, of course, arbitrary. We decided that three subsamples were better than two and that four or more would be overkill. Twenty records, as you’ll see a bit further on, is a number that allows you to see patterns in a table or graph without overcrowding the picture.

4. Display each subsample of these records in a table and a graph.

Tables 1-3 and Figures 1-3 below show how we took this step for our example. To make sure we’re being clear, let’s go through some of the details in Table 1 and Figure 1 (which we constructed for the first subsample of twenty randomly drawn records).

In Table 1 the probability values for multiple regression for each record are displayed in the left-hand column. The corresponding probability values for the same records for logistic regression are displayed in the right-hand column. For example, the multiple regression probability for the first record is .078827109. The record’s logistic regression probability is .098107437. In plain English, that means the multiple regression model for this example is saying that this particular alum has about eight chances in a hundred of giving $31 or more lifetime. The logistic regression model is saying that the same alum has about ten chances in a hundred of giving $31 or more lifetime.

Table 1: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the First of Three Randomly Drawn Subsamples of 20 Records

Figure 1 shows the pairs of values you see in Table 1 displayed graphically in a scatterplot. You’ll notice that the points in the scatterplot appear to fall along what roughly looks like a straight line. This means that the multiple regression model and the logistic regression model are assigning very similar probabilities to each of the 20 records in the subsample. If you study Table 1, you can see this trend, but the trend is much easier to discern in the scatter plot.

Table 2: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the Second of Three Randomly Drawn Subsamples of 20 Records

Table 3: Predicted Probability Values Generated from Using Multiple Regression and Logistic Regression for the Third of Three Randomly Drawn Subsamples of 20 Records

 

5. Do an eyeball comparison of the probability values in both the tables and the graphs.

We’ve already done such a comparison in Table 1 and Figure 1. If we do the same comparison for Tables 2 and 3 and for Figures 2 and 3, it’s pretty clear that we’ll come to the same conclusion: Multiple regression and logistic regression (for this example) are giving us very similar answers.

So Where Does This All Take Us?

We’d like to cover several topics in this closing section:

  • A frequent objection to using multiple regression versus logistic regression when the dependent variable is binary
  • Trying our approach on your own
  • The conclusion we think you’ll eventually arrive at
  • How we’ve just scratched the surface here

A frequent objection to using multiple regression versus logistic regression when the dependent variable is binary

Earlier we said that many statisticians seem to advise the use of logistic regression over multiple regression by invoking this logic: “A probability value can’t exceed 1 nor can it be less than 0. Since multiple regression often yields values less than 0 and greater than 1, use logistic regression.” We also said we were exaggerating the stance of these statisticians a bit (but not very much).

While we can understand this argument, our feeling is that, in the applied fields we toil in, that argument is not a very practical one. In fact a seasoned statistics professor we know says (in effect): “What’s the big deal? If multiple regression yields any predicted values less than 0, consider them 0. If multiple regression yields any values greater than 1, consider them 1. End of story.” We agree.

Trying our approach on your own

In this piece we’ve shown the results of one comparison between multiple and logistic regression on one set of data. It’s clear that the results we got for the two techniques were very similar. But does that mean we’d get such similar results with other examples? Not necessarily.

So here’s what we’d recommend. Try doing your own comparisons of the two techniques with:

  • Different data sets. If you’re a higher education institution, you might pick a couple of data sets, one for alums who’ve been out for more than 25 years and one for folks who’ve been out less than 10 years. If you’re a non-profit, you can use a set of members from the west coast and one from the east coast.
  • Different variables. Try different binary dependent variables like those we mentioned earlier: major giver/non major giver, volunteer/non-volunteer, reunion attender/non-reunion attender, etc. And try different predictors. Try to mix categorical variables like marital status with quantitative variables like age. If you’re comfortable with more sophisticated stats, try throwing in cross products and exponential terms.
  • Different splits in the dependent variable. In our example piece the dependent variable was almost an exact 50/50 split. Since the underlying variable we used was quantitative (lifetime giving), we could have adjusted those splits in a number of ways: 60/40, 75/25, 80/20, 95/5, and on and on the list could go. Had we tried these different kinds of splits, would we have the same kinds of results for the two techniques? Since we actually did look at different splits like these, we can report that the results for both techniques were pretty much the same. But that’s for this example. That could change with a different data set and different variables.

The conclusion we think you’ll eventually arrive at

We’re very serious about having you compare multiple regression and logistic regression on a variety of data sets with a variety of variables and with different splits in the dependent variable. If you do, you’ll learn a ton. Guaranteed.

On the other hand, if we put ourselves in your shoes, it’s easy to imagine your saying, “Come on guys. I’m not gonna do that. Just tell me what you think about which technique is better when the dependent variable is binary. Pick a winner.”

Given our experience, we can’t pick a winner. In fact, if pushed, we’re inclined to opt in favor of multiple regression for a couple of reasons. It not only seems to perform about as well as logistic regression, but more importantly (with the stats software we use) multiple regression is simply faster and easier to use than logistic regression. But we still use logistic regression for models with dependent variables. And we continue to compare its efficacy against multiple regression when we can. And we rarely see a meaningful difference between the results.

Why do we still use both modeling techniques? Because we think taking a hard and fast stance when you’re doing applied science is not a good idea. Too easy to end up with egg on your face. Our best advice is to use whichever method is most familiar and readily available to you.

As always, we welcome your comments and reactions. Maybe even more so with this one.

13 June 2012

Finding predictors of future major givers

Guest post by Peter B. Wylie and John Sammis

(Download a print-friendly .pdf version here: Finding Predictors of Future Major Givers)

For years a bunch of  committed data miners (we’re just a couple of them) have been pushing, cajoling, exhorting, and nudging  folks in higher education advancement to do one thing: Look as hard at their internal predictors of major giving as they look at outside predictors (like social media and wealth screenings). It seems all that drum beating has been having an effect. If you want some evidence of that, take a gander at the preconference presentations that will be given this August in Minneapolis at the APRA 25th Annual International Conference. It’s an impressive list.

So…what if you count yourself among the converted? That is, you’re convinced that looking at internal predictors of major giving is a good idea. How do you do that? How do you do that, especially if you’re not a member of that small group of folks who:

  • have a solid knowledge of applied statistics as used in both the behavioral sciences and “business intelligence?”
  • know a good bit about topics like multiple regression, logistic regression, factor analysis, and cluster analysis?
  • are practiced in the use of at least one stats application whether it’s SPSS, SAS, Data Desk, or R or some other open source option?
  • are actively doing data mining and predictive modeling on a weekly, if not daily basis?

The answer, of course, is that there is no single, right and easy way to look for predictors of major giving. What you’ll see in the rest of this piece is just one way we’ve come up with – one we hope you’ll find helpful.

Specifically, we’ll be covering two topics:

  • The fact that the big giving in most schools does not begin until people are well into their fifties, if not their sixties
  • A method for looking at variables in an alumni database that may point to younger alums who will eventually become very generous senior alums

 

Where The Big Money Starts

Here we’ll take you through the steps we followed to show that the big giving in most schools does not begin until alums are well into their middle years.

Step 1: The Schools We Used

We chose six very different schools (public and private, large and small) spread out across North America. For five of the schools, we had the entire alumni database to work with. With one school we had a random sample of more than 20,000 records.

Step 2: Assigning An Age to Every Alumni Record

Using Preferred class year, we computed an estimate of each alum’s age with this formula:

Age = 2012 – preferred class year + 22

Given the fact that many students graduate after the age of 22, it’s safe to assume that the ages we assigned to these alums are  slight to moderate underestimates of their true ages.

Step 3: Computing The Percentage of  The Sum of Lifetime Dollars Contributed by Each Alum

For all the records in each database, we computed each alum’s percentage of the sum of lifetime dollars contributed by all solicitable alums (those who are living and reachable). To do this computation, we divided each alum’s lifetime giving by the sum of lifetime giving for the entire database and converted that value to a percentage.

For example, let’s assume that the sum of lifetime giving for the solicitable alums in a hypothetical database is $50 million. Table 1 shows both the lifetime giving and the percent of the sum of lifetime giving for three different records:

Table 1: Lifetime Giving and Pecentage of The Sum of All Lifetime Giving for Three Hypothetical Alums

Just to be clear:

  • Record A has given no money at all to the school. That alum’s percentage is obviously 0.
  • Record B has given $39,500 to the school. That alum’s percentage is 0.079% of $50 million.
  • Record C has given $140,500 to the school. That alum’s percentage is 0.280% of $50 million.

Step 4: Computing The Percentage and The Cumulative Percentage of The Sum of Lifetime Dollars Contributed by Each of 15 Equal-Sized Age Groups of  Alums

For each of the six schools, we divided all alums into 15 roughly equal-sized age goups. These groups ranged from alums in their early twenties to those who had achieved or passed the century mark.

To make this all clear we have used School A (whose alums have given a sum of $164,215,000) as an example. Table 2 shows the:

  • total amount of lifetime dollars contributed by each of these age groups in School A
  • the percentage of the $164,215,000 contributed by these groups
  • the cumulative percentage of the $164,215,000 contributed by alums up to and including a certain age group

Table 2: Lifetime Giving, Percent of Sum of Lifetime Giving, and Cumulative Percent of Sum of Lifetime Giving for Fifteen Equal-Size Age Groups In School A

Here are some things that stand out for us in this table:

  • All alums 36 and younger have contributed less than 1% of the sum of lifetime givng.
  • For all alums under age 50 the cumulative amount given is just over 7% of the sum of lifetime givng.
  • For all alums under age 62 the cumulative amount given is less than 30% of the sum of lifetime givng.
  • For all alums under age 69 the cumulative amount given is slightly more than 40% of the sum of lifetime givng.
  • Well over 55% of the sum of lifetime givng has come in from alums who are 69 and older.

The big news in this table, of course, is that the lion’s share of  money in School A has come in from alums who have long since passed the age of eligibility for collecting Social Security. Not a scintilla of doubt about that.

But what about all the schools we’ve looked at? Do they show a similar pattern of giving by age? To help you decide, we’ve constructed Figues 1 – 6 that provide the same information as you see in the rightmost column of Table 2: The cumulative percentage of all lifetime giving contributed by alums up to and including a certain age group.

Since Figure 1 below captures the same information you see in the rightmost column of Table 2, you don’t need to spend a lot of time looking at it.

But we’d recommend taking your time looking at Figures 2-6. Once you’ve done that, we’ll tell you what we see.

These are the details of what we see for Schools B-F:

  • School B: Alums 48 and younger have contributed less than 5% of the sum of lifetime giving. Alums 70 and older have contributed almost 40% of the sum.
  • School C: Alums 52 and younger have contributed less than 5% of the sum. Alums 70 and older have contributed more than 40% of the sum.
  • School D: Alums 55 and younger have contributed less than 30% of the sum. Alums 70 and older have contributed almost 45% of the sum.
  • School E: Alums 50 and younger have contributed less than 30% of the sum. Alums 61 and older have contributed more than 40% of the sum.
  • School F: Alums 50 and younger have contributed less than 20% of the sum. Alums 68 and older have contributed well over 50% of the sum.

The big picture? It’s the same phenomenon we saw with School A: The big money has come in from alums who are in the “third third” of their lives.

One Simple Way To Find Possible Predictors of The Big Givers on The Horizon

Up to this point we’ve either made our case or not that the big bucks don’t start coming in from alumni until they reach their late fifties or sixties. Great, but how do we go about identifying those alums in their forties and early fifties who are likely to turn into those very generous older alums?

It’s a tough question. In our opinion, the most rigorous scientific way to answer the question is to set up a longitudinal study that would involve:

  1. Identifying all the alums in a number of different schools who are in the forties and early fifties category.
  2. Collecting all kinds of data on these folks including giving history, wealth screening and other gift capacity information, biographic information, as well as a host of fields that are included in the databases of these schools like contact information, undergraduate activities, and on and on the list would go.
  3. Waiting about ten or fifteen years until these “youngsters” become “oldsters” and see which of all that data collected on them ends up predicting the big givers from everybody else.

Well, you’re probably saying something like, “Gentlemen, surely you jest. Who the heck is gonna wait ten or fifteen years to get the answers? Answers that may be woefully outdated given how fast society has been changing in the last twenty-five years?”

Yes, of course. So what’s a reasonable alternative? The idea we’ve come up with goes something like this: If we can find variables that differentiate current, very generous older alums from less generous alums, then we can use those same variables to find younger alums who “look like” the older generous alums in terms of those variables.

To bring this idea alive, we chose one school of the six that has particularly good data on their alums. Then we took these steps:

We divided alums 57 and older into ten roughly equal size groups (deciles) by their amount of lifetime giving. Figure 7 shows the median lifetime giving for these deciles.

Table 3 gives a bit more detailed information about the giving levels of these deciles, especially the total amount of lifetime giving.

Table 3: Sum of Lifetime Dollars and Median Lifetime Dollars for 10 Equal Sized Groups of Alums 57 and Older

We picked these eight variables to compare across the deciles:

  • number of alums who have a business phone listed in the database
  • number of alums who participated in varsity athletics
  • number of alums who were a member of a greek organization as an undergraduate
  • number of alums who have an email address listed in the database
  • number of logins
  • number of reunions attended
  • number of  years of volunteering
  • number of events attended

Before we take you through Figures 8-14, we should say that the method we’ve chosen to compare the deciles on these variables is not the way a stats professor nor an experinced data miner/modeler would recommend you do the comparisons. That’s okay. We were aiming for clarity here.

Let’s go through the figures. We’ve laid them out in order from “not so hot” variables to “pretty darn good” variables.

It’s pretty obvious when you look at Fig. 8 that bigger givers, for the most part, are no more likely to have a business phone listed in the database than are poorer givers.

Varsity athletics? Yes, there’s a little bit of a trend here, but it’s not a very consistent trend. We’re not impressed.

This trend is somewhat encouraging. Good givers are more likely to have been a member of a Greek organization as an undergraduate than not so good givers. But we would not rate this one as a real good predictor.

Now we’re getting somewhere. Better givers are clearly more likely to have an e-mail address listed in the database than are poorer givers.

This one gets our attention. We’re particularly impressed with the difference in the number of logins for Decile 10 (really big givers) versus the number of logins for the lowest two deciles. At this school they should be paying attention to this variable (and they are).

This figure is pretty consistent with what we’ve found across many, many schools. It’s a good example of why we are always encouraging higher ed institutions to store reunion data and pay attention to it.

This one’s a no-brainer.

And this one’s a super no-brainer.

Where to Go from Here

After you read something like this piece, it’s natural to raise the question: “What should I do with this information?”  Some thoughts:

  • Remember, we’re not assuming that you’re a sophisticated data miner/modeler. But we are assuming that you’re interested in looking at your data to help make better decisions about raising money.
  • Without using any fancy stats software and with a little help from your advancement services folks, you can do the same kind of analysis with your own alumni data as we’ve done here. You’ll run into a few roadblocks, but you can do it. We’re convinced of that.
  • Once you’ve done this kind of an analysis you can start looking at some of your alums who are in their forties and early fifiteies who haven’t yet jumped up to a high level of giving. The ones who look like their older counterparts with respect to logins, or reunion attendance, or volunteering (or whatever good variables you’ve found)? They’re the ones worth taking a closer look at.
  • You can take your analysis and show it to someone at a higher decision-making level than your own. You can say, “Right now, I don’t know how to turn all this stuff into a predictive model. But I’d like to learn how to do that.” Or you can say, “We need to get someone in here who has the skills to turn this kind of information into a tool for finding these people who are getting ready to pop up to a much higher level of giving.”
  • And after you have become comfortable with these initial explorations of your data we encourage you to consider the next step – predictive modeling based on those statistics terms we mentioned earlier. It is not that hard. Find someone to help you – your school has lots of smart people – and give it a try. The resulting scores will go a long way toward identifying your future big givers.

As always: We’d love to get your thoughts and reactions to all this.

28 March 2012

Are we missing too many alumni with web surveys?

Filed under: Alumni, John Sammis, Peter Wylie, Surveying, Vendors — Tags: , — kevinmacdonell @ 8:04 am

Guest post by Peter B. Wylie and John Sammis

(Download a printer-friendly PDF version here: Web Surveys Wylie-Sammis)

With the advent of the internet and its exponential growth over the last decade and a half, web surveys have gained a strong foothold in society in general, and in higher education advancement in particular. We’re not experts on surveys, and certainly not on web surveys.  However, let’s assume you (or the vendor you use to do the survey) e-mail either a random sample of your alumni (or your entire universe of alumni) and invite them to go to a website and fill out a survey. If you do this, you will encounter the problem of poor response rate. If you’re lucky, maybe 30% of the people you e-mailed will respond, even if you vigorously follow-up non-responders encouraging them to please fill the thing out.

This is a problem. There will always be the lingering question of whether or not the non-responders are fundamentally different from the responders with respect to what you’re surveying them about. For example, will responders:

  • Give you a far more positive view of their alma mater than the non-responders would have?
  • Tell you they really like new programs the school is offering, programs the non-responders may really dislike, or like a lot less than the responders?
  • Offer suggestions for changes in how alumni should be approached — changes that non-responders would not offer or actively discourage?

To test whether these kinds of questions are worth answering, you (or your vendor) could do some checking to see if your responders:

  • Are older or younger than your non-responders. (Looking at year of graduation for both groups would be a good way to do this.)
  • Have a higher or lower median lifetime giving than your non-responders.
  • Attend more or fewer events after they graduate than your non-responders.
  • Are more or less likely than your non-responders to be members of a dues paying alumni association.

It is our impression that most schools that conduct alumni web surveys don’t do this sort of checking. In their reports they may discuss what their response rates are, but few offer an analysis of how the responders are different from the non-responders.

Again, we’re talking about impressions here, not carefully researched facts. But that’s not our concern in this paper. Our concern here is that web surveys (done in schools where potential responders are contacted only by e-mail) are highly unlikely to be representative of the entire universe of alums — even if the response rate for these surveys is always one hundred percent. Why? Because our evidence shows that alumni who have an e-mail address listed with their schools are markedly different (in terms of two important variables) from alumni who do not have an e-mail address listed: Age and giving.

To make our case, we’ll offer some data from four higher education institutions spread out across North America; two are private, and two are public. Let’s start with the distribution of e-mail addresses listed in each school by class year decile. You can see these data in Tables 1-4 and Figures 1-4. We’ll go through Table 1 and Figure 1 (School A) in some detail to make sure we’re being clear.

Take a look at Table 1. You’ll see that the alumni in School A have been divided up into ten roughly equal size groups where Decile 1 represents the oldest group and Decile 10 represents the youngest. The table shows a very large age range. The youngest alums in Decile 1 graduated in 1958. (Most of you reading this paper were not yet born by that year.) The alums in Decile 10 (unless some of them went back to school late in life) are all twenty-somethings.

Table 1: Count, Median Class Year, and Minimum and Maximum Class Years for All Alums Divided into Deciles for School A

 

Now look at Figure 1. It shows the percentage of alums by class year decile who have an e-mail address listed in the school’s database. Later on in the paper we’ll discuss what we think are some of the implications of a chart like this. Here we just want to be sure you understand what the chart is conveying. For example, 43.0% of alums who graduated between 1926 and 1958 (Decile 1) have an e-mail listed in the school’s database. How about Decile 9, alums who graduated between 2001 and 2005? If you came up with 86.5%, we’ve been clear.

Go ahead and browse through Tables 2-4 and Figures 2-4. After you’ve done that, we’ll tell you what we think is one of the implications of what you’ve seen so far.

Table 2: Count, Median Class Year, and Minimum and Maximum Class Years for All Alums Divided into Deciles for School B

 

Table 3: Count, Median Class Year, and Minimum and Maximum Class Years for All Alums Divided into Deciles for School C

 

 

 

Table 4: Count, Median Class Year, and Minimum and Maximum Class Years for All Alums Divided into Deciles for School D

 

 

The most significant implication we can draw from what we’ve shown you so far is this: If any of these four schools were to conduct a web survey by only contacting alums with an e-mail address, they would simply not reach large numbers of alums whose opinions they are probably interested in gathering. Some specifics:

  • School A: They would miss huge numbers of older alums who graduated in 1974 and earlier. By rough count over 40% of these folks would not be reached. That’s a lot of senior folks who are still alive and kicking and probably have pronounced views about a number of issues contained in the survey.
  • School B: A look at Figure 2 tells us that even considering doing a web survey for School B is probably not a great idea. Fewer than 20% of their alums who graduated in 1998 or earlier have an e-mail address listed in their database.

Another way of expressing this implication is that each school (regardless of what their response rates were) would largely be tapping the opinions of younger alums, not older or even middle-aged alums. If that’s what a school really wants to do, okay. But we strongly suspect that’s not what it wants to do.

Now let’s look at something else that concerns us about doing web surveys if potential respondents are only contacted by e-mail: Giving. Figures 5-8 show the percentage of alums who have given $100 or more lifetime by e-mail address/no-email address across class year deciles.

As we did with Figure 1, let’s go over Figure 5 to make sure it’s clear. For example, in decile 1 (oldest alums) 87% of alumni with an e-mail address have given $100 or more lifetime to the school. Alums in the same decile who do not have an e-mail address? 71% of these alums have given $100 lifetime or more to the school.  How about decile 10, the youngest group? What are the corresponding percentages of giving for those alums with and without an e-mail address? If you came up with 14% versus 6%, we’ve been clear.

Take a look at Figures 6-8, for schools B, C and D. Then we’ll tell you the second implication we see in all these data.

The overall impression we get from these four figures is clear: Alumni who do not have an e-mail address listed give considerably less money to their schools than do alumni with an e-mail address listed. This difference can be particularly pronounced among older alums.

Some Conclusions

The title of this piece is: “Are We Missing Too Many Alumni with Web Surveys?” Based on the data we’ve looked at, we think the answer to this question has to be a “yes.” It can’t be a good thing that many web surveys don’t go out to so many older alums who don’t have an e-mail address, and to alums without an e-mail address who haven’t given as much (on average) as those with an e-mail address.

On the other hand, we want to stress that web surveys can provide a huge amount of valuable information from the alums who are reached and do respond. Even if the coverage of the whole alumni universe is incomplete, the thousands of alums who take the time to fill out these surveys can’t be ignored.

Here’s an example. We got to reading through the hundreds and hundreds of written comments from a recent alumni survey. We haven’t included any of the comments here, but my (Peter’s) reaction to the comments was visceral. Wading through all the typos, and misspellings, and fractured syntax, I found myself cheering these folks on:

  •  “Good for you.”
  • “Damn right.”
  • “Couldn’t have said it better myself.”
  • “I wish the advancement and alumni people at my college could read these.”

In total, these comments added up to almost 50,000 words of text, the length of a short novel. And they were a lot more interesting than the words in too many of the novels I read.

As always, we welcome your comments.

15 February 2012

Are we underestimating the generosity of our older alums?

Filed under: Alumni, Annual Giving, John Sammis, Peter Wylie — Tags: , , — kevinmacdonell @ 8:50 am

Guest post by Peter Wylie and John Sammis

(Download a PDF version here: Are we underestimating the generosity of our older alums?)

I’m an older alum. I don’t want my generosity to be underestimated by my alma mater. Trouble is, if any of my old college buddies happen to read this, they’ll say, “Really? And what generosity would that be, Pete?”

Yeah, well, I’m not the kind of person we’re talking about in this piece. We’re talking about alums who graduated at least 30 years ago and have made substantial contributions to their colleges or universities. There are a heck of a lot of them out there, and more and more such alums are joining the ranks as our population ages.

So … let’s say you work in advancement in higher education or a secondary school. Does the data you have stored on these “senior” folks cause you to underestimate what they’ve given to your institution? John Sammis and I would offer a strong “yes” to this question. Why? Because of two phenomena that most of us rarely consider when we look at the lifetime hard credit dollars given by these alums: (1) inflation and (2) the fact that electronic giving records rarely go back much further than 1985.

In this piece we’ll take you through a series of examples from one college that has many older alums. The folks who work in advancement at that college agree with us. They think that inflation and electronic record keeping have caused them to underestimate the generosity of many of their older alums. We hope you find the examples intriguing, and we hope they cause you to think about how you may be doing the same kind of underestimating.

Let’s start off by looking at the lifetime hard credit giving at our example college. Table 1 shows that the oldest class year quartile (alums who graduated between 1931 and 1976) have given far more than the remaining three quarters of younger alums.

We see this kind of phenomenon nearly every week when we look at a new alumni database. The oldest 25% of alums almost always dwarf the cumulative giving of all other alums. And yet, in spite of that fact, we think the giving of these older alums is underrepresented. Bear with us.

Now let’s look at Table 2, which gives a picture of the inflation that has occurred in the United States over the last 60 years or so.

To make sure we’re being clear in the table, we’d ask you to indulge us and answer these three questions:

  1. If an alum made a gift of $161 dollars in 1950, what would that gift amount to in 2011 dollars?
  2. If an alum made a $50,000 gift in 2011, what would be the 1965 equivalent of that gift?
  3. If an alum made a gift of $1,198 in 1975, what would that gift amount to in 2011 dollars?

If you came up with these answers, we’ve been clear:

  1. $1,500
  2. $6,914
  3. $5,000

If you found the table a bit confusing, maybe a look at Figure 1 will help. It shows the same information conveyed in the leftmost column of Table 2. Whether you look at the table or the figure, the big picture is that there has been a good amount of inflation in this country over the last six decades. More to the point, what look like small gifts made decades ago look like very substantial gifts in today’s dollars.

What we’ll be doing now is speculative. We’ll be looking at the dollars that specific alums at our example school have contributed over many years. And then we’ll be estimating what those dollars are worth in terms of 2011 dollars. We should caution you: Our estimates could be pretty accurate, or they could be off the mark by quite a bit.

We’ll start by looking at the top five lifetime givers in each of the class year quartiles as laid out in Table 3. As you’d expect, the giving of the top five alums in the first quartile (those graduating between 1931 and 1976) greatly outdistances the giving of the second quartile top five alums (those graduating between 1977 and 1989) and so on down the line.

Now let’s check out something interesting for just the top five givers in class year quartiles 1 and 2. Notice in Figure 2 below that:

  • The number one giver graduated in 1943 but didn’t make a first gift until 1983.
  • The number two giver graduated in 1959 but didn’t make a first gift until 1984
  • The number three giver graduated in 1967 but didn’t make a first gift until 1984.
  • The number four giver graduated in 1949 but didn’t make a first gift until 1983.
  • The number five giver graduated in 1965 but didn’t make a first gift until 1984.

Now take a look at Figure 3. Notice that:

  • The number one giver graduated in 1983 and made a first gift in 1983.
  • The number two giver graduated in 1985 and made a first gift in 1994. (This is the first time we’ve seen a first gift made after 1984.)
  • The number three giver graduated in 1981 and made a first gift in 1983.
  • The number four giver graduated in 1980 and made a first gift in 1983.
  • The number five giver graduated in 1978 and made a first gift in 1984.

That was a lot of detail to offer you – maybe more than necessary. But by offering the detail we wanted to make at least two important points. The first is that our example college obviously has not recorded gift giving (electronically) before 1983. How do we conclude that (other than the fact that our contacts at the college confirmed it)? Because none of the ten alums we’ve looked at are listed as having made a first gift before 1983. No big surprise there.

But we think our second point is more attention-getting. In both quartiles there are alums listed as having graduated before 1983 (some of them long before then). What do we know about their giving prior to 1983? That’s our second point. We simply don’t know what their giving was prior to 1983. And here’s where our speculation comes in.

This is what we did. For each of the top three givers in the first and second class year quartiles, we made two inflation adjustments to their actual lifetime giving amounts: A conservative estimate, and a liberal estimate. As you read through how we made these estimates, you may disagree to some extent with our approach. We’d be surprised if you didn’t. But we’d like to defer discussion of such disagreements until the end of the piece.

The conservative estimate.

We took the year of each alum’s first gift and the year of each alum’s last gift, added them together, and divided that number by two. For example, let’s take the top giver in Quartile 1 whose lifetime hard credit giving is recorded as $11,286,872. That alum’s recorded year of first gift is 1983. His or her last gift was made in 2005. The average we computed was 1994. Using an inflation calculator, we arrived at an estimated lifetime giving amount of $17,097,560. In other words we converted $11,286,872 from 1994 dollars to 2011 dollars.

The liberal estimate.

We took each alum’s year of graduation and the year of each alum’s last gift, added them together, and divided that number by two. Let’s go back to our example of the top giver in Quartile 1 whose lifetime hard credit giving is recorded as $11,286,872. That alum’s year of graduation is 1943. His or her last gift was made in 2005. The average we computed was 1974. Using the same inflation calculator, we arrived at an estimated lifetime giving amount of $51,384,972. In other words we converted $11,286,872 from 1974 dollars to 2011 dollars.

In Figures 4-6 we compare the top three givers in class year quartile 1 and class year quartile 2 in terms of recorded lifetime giving, a conservative estimate of inflation adjusted giving, and a liberal estimate of inflation adjusted giving. In each figure you’ll see some dramatic giving differences between the older alum in class year quartile 1 (1931-1976) and the younger alum in class year quartile 2 (1977-1989). Since we’ve already covered a lot of information included in Figure 4, we’ll skip to Figure 5 and offer some reasons for why these differences are so large. To avoid overloading you with detail, we won’t do that for Figure 6, but if we did, the same kind of thinking would apply.

Here the older alum graduated in 1959, and the younger alum graduated in 1985. The older alums is electronically listed as having made his or her first gift in 1984 and his or her last gift in 2010 The younger alum made his or her first gift in 1994 and his or her last gift in 2008. Here’s what we think is going on:

  • We’re certain that the actual giving amount for the younger alum ($3,127,000) is accurate. We’re far less certain about the actual giving amount ($10,150,030) for the older alum. We suspect that this amount is only the money in 2011 dollars that the alum contributed since 1984.
  • How about the conservative estimate for each alum? The amounts for both are greater because we used the middle year between the first and last gift to adjust for inflation.
  • How about the liberal estimate for each alum? Notice that this estimate for the younger alum is the same as the conservative estimate because the middle year for both estimates is the same. But not for the older alum. Here we picked the middle year between 1959 (the alum’s grad year) and 2010 (the last gift year) as the year to adjust for inflation. That year is 1984, thirteen years earlier than 1997, the year we used for the conservative estimate. That’s why we see the large jump from $14,196,156 to $20,777,384.

Closing Thoughts

The first thought we’d like to offer is that no one (including us) should make any hard and fast conclusions from what we’ve presented here. The data are only from one school, and our inflation estimates are certainly open to at least some healthy skepticism.

That said, we’d like you to consider these points:

  • Inflation is something we’ve never seen figured into how lifetime giving is computed at fundraising institutions. In the almost six years the two of us have been working together, we’ve looked at giving data from at least 200 non-profits and schools. In all that time we’ve never had more than a fleeting discussion with anyone at those institutions about how both limited record keeping and inflation have distorted the giving picture they have of older donors. We could blame those folks for that shortcoming, but if we did that, we’d have to blame ourselves more. So we won’t do either. What we will do is start paying more attention to this issue and talking it up. What we’ve seen as a product of doing this paper is strong motivation to do just that.
  • Depending on how accurately your gift data has been stored, it shouldn’t be hard to make more accurate inflation estimates than we have here. We know you’re busy, and we know your advancement services folks are stretched thin. But a little project that involved 20 or so of your major donors who have made multiple gifts might be enlightening. All that would be required is to make an inflation adjustment for each gift for each donor. Then simply add those adjusted gifts up for the donors and compare the recorded lifetime amounts with the inflation adjustment amounts. Maybe you’d have a ho-hum reaction to what you see, but we doubt it.
  • Anything that might get folks in advancement to focus more on the value of their internal data can’t be a bad thing. In prospect research/major giving there is a huge emphasis on looking to the outside of an institution’s database to find evidence of giving capacity for the people stored in that database. There’s nothing wrong with that. Acquiring information about people’s wherewithal to make major gifts is important. However, and this is a big however, there is a huge under-emphasis in the field on looking at internal data – data that is often far more accurate than external data, and certainly far less expensive to access and look at. For example, you simply will not find external data that point to someone in your database who made a seemingly mid-size gift many years ago that is worth a huge amount in today’s dollars. But uncovering that kind of information in your own database could provide a strong hint (that won’t come from any outside source) about that person’s assets as well as his or her likelihood of sharing those assets with the place you work for.

As always, we’d love to get your reactions to what we’ve had to say here.

« Newer PostsOlder Posts »

Create a free website or blog at WordPress.com.