CoolData blog

15 January 2013

The cautionary tale of Mr. S. John Doe

A few years ago I met with an experienced Planned Giving professional who had done very well over the years without any help from predictive modeling, and was doing me the courtesy of hearing my ideas. I showed this person a series of charts. Each chart showed a variable and its association with the condition of being a current Planned Giving expectancy. The ultimate goal would have been to consolidate these predictors together as a score, in order to discover new expectancies in that school’s alumni database. The conventional factors of giving history and donor loyalty are important, I conceded, but other engagement-related factors are also very predictive: student activities, alumni involvement, number of degrees, event attendance, and so on.

This person listened politely and was genuinely interested. And then I went too far.

One of my charts showed that there was a strong association between being a Planned Giving expectancy and having a single initial in the First Name field. I noted that, for some unexplained reason, having a preference for a name like “S. John Doe” seemed to be associated with a higher propensity to make a bequest. I thought that was cool.

The response was a laugh. A good-natured laugh, but still — a laugh. “That sounds like astrology!”

I had mistaken polite interest for a slam-dunk, and in my enthusiasm went too far out on a limb. I may have inadvertently caused the minting of a new data-mining skeptic. (Eventually, the professional retired after completing a successful career in Planned Giving, and having managed to avoid hearing much more about predictive modeling.)

At the time, I had hastened to explain that what we were looking at were correlations — loose, non-causal relationships among various characteristics, some of them non-intuitive or, as in this case, seemingly nonsensical. I also explained that the linkage was probably due to other variables (age and sex being prime candidates). Just because it’s without explanation doesn’t mean it’s not useful. But I suppose the damage was done. You win some, you lose some.

Although some of the power (and fun) of predictive modeling rests on the sometimes non-intuitive and unexplained nature of predictor variables, I now think it’s best to frame any presentation to a general audience in terms of what they think of as “common sense”. Limiting, yes. But safer. Unless you think your listener is really picking up what you’re laying down, keep it simple, keep it intuitive, and keep it grounded.

So much for sell jobs. Let’s get back to the data … What ABOUT that “first-initial” variable? Does it really mean anything, or is it just noise? Is it astrology?

I’ve got this data set in front of me — all alumni with at least some giving in the past ten years. I see that 1.2% percent of all donors have a first initial at the front of their name. When I look at the subset of the records that are current Planned Giving expectancies, I see that 4.6% have a single-initial first name. In other words, Planned Giving expectancies are almost four times as likely as all other donors to have a name that starts with a single initial. The data file is fairly large — more than 17,000 records — and the difference is statistically significant.

What can explain this? When I think of a person whose first name is an initial and who tends to go by their middle name, the image that comes to mind is that of an elderly male with a higher than average income — like a retired judge, say. For each of the variables Age and Male, there is in fact a small positive association with having a one-character first name. Yet, when I account for both ‘Age’ and ‘Male’ in a regression analysis, the condition of having a leading initial is still significant and still has explanatory power for being a Planned Giving expectancy.

I can’t think of any other underlying reasons for the connection with Planned Giving. Even when I continue to add more and more independent variables to the regression, this strange predictor hangs in there, as sturdy as ever. So, it’s certainly interesting, and I usually at least look at it while building models.

On the other hand … perhaps there is some justification for the verdict of “astrology” (that is, “nonsense”). The data set I have here may be large, but the number of Planned Giving expectancies is less than 500 — and 4.6% of 500 is not very many records. Regardless of whether p ≤ 0.0001, it could still be just one of those things. I’ve also learned that complex models are not better than simple ones, particularly when trying to predict something hard like Planned Giving propensity. A quirky variable that suggests no potential causal pathway makes me wary of the possibility of overfitting the noise in my data and missing the signal.

Maybe it’s useful, maybe it’s not. Either way, whether I call it “cool” or not will depend on who I’m talking to.

13 June 2012

Finding predictors of future major givers

Guest post by Peter B. Wylie and John Sammis

(Download a print-friendly .pdf version here: Finding Predictors of Future Major Givers)

For years a bunch of  committed data miners (we’re just a couple of them) have been pushing, cajoling, exhorting, and nudging  folks in higher education advancement to do one thing: Look as hard at their internal predictors of major giving as they look at outside predictors (like social media and wealth screenings). It seems all that drum beating has been having an effect. If you want some evidence of that, take a gander at the preconference presentations that will be given this August in Minneapolis at the APRA 25th Annual International Conference. It’s an impressive list.

So…what if you count yourself among the converted? That is, you’re convinced that looking at internal predictors of major giving is a good idea. How do you do that? How do you do that, especially if you’re not a member of that small group of folks who:

  • have a solid knowledge of applied statistics as used in both the behavioral sciences and “business intelligence?”
  • know a good bit about topics like multiple regression, logistic regression, factor analysis, and cluster analysis?
  • are practiced in the use of at least one stats application whether it’s SPSS, SAS, Data Desk, or R or some other open source option?
  • are actively doing data mining and predictive modeling on a weekly, if not daily basis?

The answer, of course, is that there is no single, right and easy way to look for predictors of major giving. What you’ll see in the rest of this piece is just one way we’ve come up with – one we hope you’ll find helpful.

Specifically, we’ll be covering two topics:

  • The fact that the big giving in most schools does not begin until people are well into their fifties, if not their sixties
  • A method for looking at variables in an alumni database that may point to younger alums who will eventually become very generous senior alums

 

Where The Big Money Starts

Here we’ll take you through the steps we followed to show that the big giving in most schools does not begin until alums are well into their middle years.

Step 1: The Schools We Used

We chose six very different schools (public and private, large and small) spread out across North America. For five of the schools, we had the entire alumni database to work with. With one school we had a random sample of more than 20,000 records.

Step 2: Assigning An Age to Every Alumni Record

Using Preferred class year, we computed an estimate of each alum’s age with this formula:

Age = 2012 – preferred class year + 22

Given the fact that many students graduate after the age of 22, it’s safe to assume that the ages we assigned to these alums are  slight to moderate underestimates of their true ages.

Step 3: Computing The Percentage of  The Sum of Lifetime Dollars Contributed by Each Alum

For all the records in each database, we computed each alum’s percentage of the sum of lifetime dollars contributed by all solicitable alums (those who are living and reachable). To do this computation, we divided each alum’s lifetime giving by the sum of lifetime giving for the entire database and converted that value to a percentage.

For example, let’s assume that the sum of lifetime giving for the solicitable alums in a hypothetical database is $50 million. Table 1 shows both the lifetime giving and the percent of the sum of lifetime giving for three different records:

Table 1: Lifetime Giving and Pecentage of The Sum of All Lifetime Giving for Three Hypothetical Alums

Just to be clear:

  • Record A has given no money at all to the school. That alum’s percentage is obviously 0.
  • Record B has given $39,500 to the school. That alum’s percentage is 0.079% of $50 million.
  • Record C has given $140,500 to the school. That alum’s percentage is 0.280% of $50 million.

Step 4: Computing The Percentage and The Cumulative Percentage of The Sum of Lifetime Dollars Contributed by Each of 15 Equal-Sized Age Groups of  Alums

For each of the six schools, we divided all alums into 15 roughly equal-sized age goups. These groups ranged from alums in their early twenties to those who had achieved or passed the century mark.

To make this all clear we have used School A (whose alums have given a sum of $164,215,000) as an example. Table 2 shows the:

  • total amount of lifetime dollars contributed by each of these age groups in School A
  • the percentage of the $164,215,000 contributed by these groups
  • the cumulative percentage of the $164,215,000 contributed by alums up to and including a certain age group

Table 2: Lifetime Giving, Percent of Sum of Lifetime Giving, and Cumulative Percent of Sum of Lifetime Giving for Fifteen Equal-Size Age Groups In School A

Here are some things that stand out for us in this table:

  • All alums 36 and younger have contributed less than 1% of the sum of lifetime givng.
  • For all alums under age 50 the cumulative amount given is just over 7% of the sum of lifetime givng.
  • For all alums under age 62 the cumulative amount given is less than 30% of the sum of lifetime givng.
  • For all alums under age 69 the cumulative amount given is slightly more than 40% of the sum of lifetime givng.
  • Well over 55% of the sum of lifetime givng has come in from alums who are 69 and older.

The big news in this table, of course, is that the lion’s share of  money in School A has come in from alums who have long since passed the age of eligibility for collecting Social Security. Not a scintilla of doubt about that.

But what about all the schools we’ve looked at? Do they show a similar pattern of giving by age? To help you decide, we’ve constructed Figues 1 – 6 that provide the same information as you see in the rightmost column of Table 2: The cumulative percentage of all lifetime giving contributed by alums up to and including a certain age group.

Since Figure 1 below captures the same information you see in the rightmost column of Table 2, you don’t need to spend a lot of time looking at it.

But we’d recommend taking your time looking at Figures 2-6. Once you’ve done that, we’ll tell you what we see.

These are the details of what we see for Schools B-F:

  • School B: Alums 48 and younger have contributed less than 5% of the sum of lifetime giving. Alums 70 and older have contributed almost 40% of the sum.
  • School C: Alums 52 and younger have contributed less than 5% of the sum. Alums 70 and older have contributed more than 40% of the sum.
  • School D: Alums 55 and younger have contributed less than 30% of the sum. Alums 70 and older have contributed almost 45% of the sum.
  • School E: Alums 50 and younger have contributed less than 30% of the sum. Alums 61 and older have contributed more than 40% of the sum.
  • School F: Alums 50 and younger have contributed less than 20% of the sum. Alums 68 and older have contributed well over 50% of the sum.

The big picture? It’s the same phenomenon we saw with School A: The big money has come in from alums who are in the “third third” of their lives.

One Simple Way To Find Possible Predictors of The Big Givers on The Horizon

Up to this point we’ve either made our case or not that the big bucks don’t start coming in from alumni until they reach their late fifties or sixties. Great, but how do we go about identifying those alums in their forties and early fifties who are likely to turn into those very generous older alums?

It’s a tough question. In our opinion, the most rigorous scientific way to answer the question is to set up a longitudinal study that would involve:

  1. Identifying all the alums in a number of different schools who are in the forties and early fifties category.
  2. Collecting all kinds of data on these folks including giving history, wealth screening and other gift capacity information, biographic information, as well as a host of fields that are included in the databases of these schools like contact information, undergraduate activities, and on and on the list would go.
  3. Waiting about ten or fifteen years until these “youngsters” become “oldsters” and see which of all that data collected on them ends up predicting the big givers from everybody else.

Well, you’re probably saying something like, “Gentlemen, surely you jest. Who the heck is gonna wait ten or fifteen years to get the answers? Answers that may be woefully outdated given how fast society has been changing in the last twenty-five years?”

Yes, of course. So what’s a reasonable alternative? The idea we’ve come up with goes something like this: If we can find variables that differentiate current, very generous older alums from less generous alums, then we can use those same variables to find younger alums who “look like” the older generous alums in terms of those variables.

To bring this idea alive, we chose one school of the six that has particularly good data on their alums. Then we took these steps:

We divided alums 57 and older into ten roughly equal size groups (deciles) by their amount of lifetime giving. Figure 7 shows the median lifetime giving for these deciles.

Table 3 gives a bit more detailed information about the giving levels of these deciles, especially the total amount of lifetime giving.

Table 3: Sum of Lifetime Dollars and Median Lifetime Dollars for 10 Equal Sized Groups of Alums 57 and Older

We picked these eight variables to compare across the deciles:

  • number of alums who have a business phone listed in the database
  • number of alums who participated in varsity athletics
  • number of alums who were a member of a greek organization as an undergraduate
  • number of alums who have an email address listed in the database
  • number of logins
  • number of reunions attended
  • number of  years of volunteering
  • number of events attended

Before we take you through Figures 8-14, we should say that the method we’ve chosen to compare the deciles on these variables is not the way a stats professor nor an experinced data miner/modeler would recommend you do the comparisons. That’s okay. We were aiming for clarity here.

Let’s go through the figures. We’ve laid them out in order from “not so hot” variables to “pretty darn good” variables.

It’s pretty obvious when you look at Fig. 8 that bigger givers, for the most part, are no more likely to have a business phone listed in the database than are poorer givers.

Varsity athletics? Yes, there’s a little bit of a trend here, but it’s not a very consistent trend. We’re not impressed.

This trend is somewhat encouraging. Good givers are more likely to have been a member of a Greek organization as an undergraduate than not so good givers. But we would not rate this one as a real good predictor.

Now we’re getting somewhere. Better givers are clearly more likely to have an e-mail address listed in the database than are poorer givers.

This one gets our attention. We’re particularly impressed with the difference in the number of logins for Decile 10 (really big givers) versus the number of logins for the lowest two deciles. At this school they should be paying attention to this variable (and they are).

This figure is pretty consistent with what we’ve found across many, many schools. It’s a good example of why we are always encouraging higher ed institutions to store reunion data and pay attention to it.

This one’s a no-brainer.

And this one’s a super no-brainer.

Where to Go from Here

After you read something like this piece, it’s natural to raise the question: “What should I do with this information?”  Some thoughts:

  • Remember, we’re not assuming that you’re a sophisticated data miner/modeler. But we are assuming that you’re interested in looking at your data to help make better decisions about raising money.
  • Without using any fancy stats software and with a little help from your advancement services folks, you can do the same kind of analysis with your own alumni data as we’ve done here. You’ll run into a few roadblocks, but you can do it. We’re convinced of that.
  • Once you’ve done this kind of an analysis you can start looking at some of your alums who are in their forties and early fifiteies who haven’t yet jumped up to a high level of giving. The ones who look like their older counterparts with respect to logins, or reunion attendance, or volunteering (or whatever good variables you’ve found)? They’re the ones worth taking a closer look at.
  • You can take your analysis and show it to someone at a higher decision-making level than your own. You can say, “Right now, I don’t know how to turn all this stuff into a predictive model. But I’d like to learn how to do that.” Or you can say, “We need to get someone in here who has the skills to turn this kind of information into a tool for finding these people who are getting ready to pop up to a much higher level of giving.”
  • And after you have become comfortable with these initial explorations of your data we encourage you to consider the next step – predictive modeling based on those statistics terms we mentioned earlier. It is not that hard. Find someone to help you – your school has lots of smart people – and give it a try. The resulting scores will go a long way toward identifying your future big givers.

As always: We’d love to get your thoughts and reactions to all this.

24 March 2011

Does your astrological sign predict whether you’ll give?

Filed under: Alumni, Correlation, Peter Wylie, Predictor variables, Statistics — Tags: , , — kevinmacdonell @ 7:20 am

Last weekend, with so many other pressing things I could have been doing, I got it in my head to analyze people’s astrological signs for potential association with propensity to give. I don’t know what came over me; perhaps it was the Supermoon. But when you’ve got a data set in front of you that contains giving history and good birth dates for nearly 85,000 alumni, why not?

Let me say first that I put no stock in astrology, but I know a few people who think being a Libra or a Gemini makes some sort of difference. I imagine there are many more who are into Chinese astrology, who think the same about being a Rat or a Monkey. And even I have to admit that an irrational aspect of me embraces my Taurus/Rooster nature.

If one’s sign implies anything about personality or fortune, I should think it would be reflected in one’s generosity. Ever in pursuit of the truth, I spent a rather tedious hour parsing 85,000 birth dates into the signs of the zodiac and the animal signs of Chinese astrology. As you will see, there are in fact some interesting patterns associated with birth date, on the surface at least.

Because human beings mate at any time of year, the alumni in the sample are roughly equally distributed among the 12 signs of the zodiac. There seem to be slightly more births in the warmer months than in the period of December to February: Cancer (June 21 to July 22) represents 8.9% of the sample while at the lower end, Capricorn (Dec 22 to Jan 20) represents 7.6% of the sample — a spread of less than two percentage points.

What we want to know is if any one sign is particularly likely to give to alma mater. I coded anyone who had any giving in their lifetime as ’1′ and all never-donors as ’0′. At the high end, Taurus natives have a donor rate of 30.7% and at the low end, Aries natives have a donor rate of 29.0%. All the other signs fall between those two rates, a range of a little more than one and a half percentage points.

That’s a very narrow range of variance. If I were seriously evaluating the variable ‘Astrological sign’ as a predictor, I would probably stop right there, seeing nothing exciting enough to make me continue.

But have a look at this bar chart. I’ve arranged the signs in their calendar order, which immediately suggests that there’s a pattern in the data: A peak at Taurus, gradually falling to Scorpio, peaking again at Sagittarius, then falling again until Taurus comes around once more.

The problem with the bar chart is that the differences in giving rates are exaggerated visually, because the range of variance is so limited. What appears to be a pattern may be nothing of the sort.

In fact, the next chart tells a conflicting tale. The Tauruses may have the highest participation rate, but among donors they and three other signs have the lowest median level of lifetime giving ($150), and Aries have the highest median ($172.50). The calendar-order effect we saw above has vanished.

These two charts fail to tell the same tale, which indicates to me that although we may observe some variance in giving between astrological signs, the variance might well be due to mere chance. Is there a way to demonstrate this statistically? I was discussing this recently with Peter Wylie, who helped me sort this out. Peter told me that the supposed pattern in the first chart reminded him of the opening of Malcolm Gladwell’s book, Outliers, in which the author examines why a hugely disproportionate number of professional hockey and soccer players are born in January, February and March. (I won’t go farther than that — read the book for that discussion.)

In the case of professional hockey players, birth date and a player’s development (and career progress) are definitely associated. It’s not due to a random effect. In the case of birth date and giving, however, there is room for doubt. Peter took me through the use of chi-square, a statistic I hadn’t encountered since high school. I’m not going into detail about chi-square — there is plenty out there online to read — but briefly, chi-square is used to determine if a distribution of observed frequencies of a value for a categorical or ordinal variable differs from the theoretical expected frequencies for that variable, and from there, if the discrepancy is statistically significant.

Figuring out the statistical significance part used to involve looking up the calculated value for chi-square in a table based on something called degrees of freedom, but nowadays your stats software will automatically provide you with a statistic telling you whether the result is significant or not: the p statistic, which will be familiar to you if you’ve used linear regression. The rule of thumb for significance is a p-value of 0.05 or less.

As it turns out, the observed differences in the frequency of donors for each astrological sign has a significance value of p = 0.3715. This is way above the 0.05 confidence level, and therefore we cannot rule out the possibility that these variations are due to mere chance. So astrology is a bust for fundraisers.

Now for something completely different. We haven’t looked at the Chinese animal signs yet. Here is a table showing a breakdown by Chinese astrological sign by the percentage of alumni with at least some giving, and median lifetime giving. The table is sorted by donor participation rate, lowest to highest.

Hmm, it would seem that being a Horse is associated with a higher level of generosity than the norm. And here’s the biggest surprise: A Chi-square test reveals the differences in donor frequencies between animals to be significant! (p-value < 0.0001).

What’s going on here? Shall we conclude that the Chinese astrologers have it all figured out?

Let’s go back to the data. First of all, how were alumni assigned an animal sign in the first place? You may be familiar with the paper placemats in Chinese restaurants that list birth years and their corresponding animal signs. Anyone born in the years 1900, 1912, 1924, 1936, 1948, 1960, 1972, 1984, 1996 or 2008 is a Rat. Anyone born in 1901, 1913, 1925, etc. etc. is an Ox, and so on, until all the years are accounted for. Because the alumni in each animal category are drawn from birth years with an equal span of years between them, we might assume that each sign has roughly the same average age. This is key, because if the signs differ on average age, then age might be an underlying cause of variations in giving.

My data set does not include anyone born before 1930, and goes up to 1993 (a single precocious alum who graduated at a very young age). Tigers, with the lowest participation rate, are drawn from the birth years 1938, 1950, 1962, 1974 and 1986. Horses, with the highest participation rate, are drawn from the birth years 1930, 1942, 1954, 1966 and 1978, plus only a handful of young alumni from 1990. For Tigers, 77% were born in 1974 or earlier, but for Horses, 99% of alumni were born in 1978 or earlier.

The bottom line is that the Horses in my data set are older than the Tigers, as a group. The Horses have a median age of 45, while the Tigers have a median age of 37. And we all know by now that older alumni are more likely to be donors.

Again, my conversation with Peter Wylie helped me figure this out statistically. The short answer is: After you’ve accounted for the age of alumni, variations in giving by animal sign are no longer significant.

(The longer answer is: If you perform a linear regression of Age on Lifetime Giving (log-transformed) and compute residuals, then run an Analysis of Variance (ANOVA) for the residuals and Animal Sign, the variance is NOT significant, p = 0.1118. The residuals can be thought of as Lifetime Giving with the explanatory effect of Age “washed out,” leaving only the unexplained variance. Animal Sign fails to account for any significant amount of the remaining variance in LT Giving, which is an indication that Animal Sign is just a proxy for Age.)

Does any of this matter? Mostly no. First of all, a little common sense can keep you out of trouble. Sure, some significant predictors will be non-intuitive, but it doesn’t hurt to be skeptical. Second, if you do happen to prepare some predictors based on astrological sign, their non-significance will be evident as soon as you add them to your regression analysis, particularly if you’ve already added Age or Class Year as a predictor in the case of the Chinese signs. Altogether, then, the risk that your models will be harmed by such meaningless variables is very low.

8 February 2010

How to do basic text-mining

Filed under: Annual Giving, Derived variables, Text, Text mining — Tags: , — kevinmacdonell @ 8:49 am

Turn prose into data for insights into your constituents' behaviour. (Photo used via Creative Commons licence. Click photo for source.)

Database users at universities make frequent use of free-text comment fields to store information. Too frequent use, perhaps. Normally, free text is resorted to only when there’s a need to store information of a type that cannot be conveniently coded (preferably from a pre-established “validation table” of allowed values). Unstructured information such as comments requires some work to turn it into data that can reveal patterns and correlations. This work is called text mining.

Here are steps I took to do some rather crude text-mining on a general-comments field in our database. My method was first to determine which words were used most frequently, then select a few common ‘suggestive’ words that might show interesting correlations, and finally to test the variables I made from them for correlations with giving to our institution.

The comments I was trying to get at were generated from our Annual Giving phonathon. Often these comments flag alumni behaviours such as hanging up on the caller, being verbally abusive, or other negative things. As certain behaviours often prompt the same comments over and over (eg. “hung up on the caller”), I thought that certain frequently-occurring keywords might be negatively correlated with giving.

The method outlined below is rather manual. As well, it focuses on single words, rather than word combinations or phrases. There are some fantastic software packages out there for going much deeper, more quickly. But giving this a try is not difficult and will at least give you a taste for the idea behind text mining.

My method was first to discover the most common words that sounded like they might convey some sense of “attitude”:

  • Using a query in Access, I extracted the text of all comments, plus comment type, from the database – including the ID of the individual. (We use Banner so this data came from the APACOMT screen.)
  • I dumped the data into Excel, and eliminated certain unwanted comments by type code (such as event attendance, bios, media stories, etc.), leaving about 6,600 comments. (I saved this Excel file, to return to later on.)
  • I copied only the column of remaining comments, and pasted this text into a basic text editor. (I like to use EditPad Lite, but anything you have that works with big .txt files is fine.)
  • I used Find-and-replace to change all spaces into carriage returns, so that each word was on one line.
  • I used Find-and-replace again to removed common punctuation (quote marks, periods, commas etc.)
  • I changed all uppercase characters to lowercase characters, so “The” wouldn’t be counted separately from “the”.
  • The result was a very long column of single words. I copied the whole thing, and pasted it into Data Desk, as a single variable.
  • This allowed me to create a frequency table, sorted by count so the most common words would appear at the top. More than 100,000 cases fell into a little less than 5,000 categories (i.e. words).

The most common words were, in order: to, the, a, made, and, be, mail, by, only, from, not, list, removed, nn, of, in, solicitation, he, no, phonathon, she, pledge, is, wishes, said, unhonoured, on, does, was, giving, phone, will, caller, her, donate.

I recognized some of our most common comments, including “made by-mail-only”, “made phonathon no”, “unhonoured pledge”, etc. These states are already covered by specific coding elsewhere in the database, so I skipped over these and looked farther down to some of the more subjective “mood” words, such as “hang” and “hung” (which almost always meant “hung up the phone”), “rude”, “upset”, “never”, “told”, etc.

I went back to my original Excel file of comments and created a few new columns to hold a 0/1 variable for some of these categories. This took some work in Excel, using the “Contains” text filter. So, for example, every comment that contained some variation on the theme of ‘hanging up the phone’ received a 1 in the column called “Hung up”, and all the others got a zero.

From there, it was easy to copy the IDs, with the new variable(s), into Data Desk, where I matched the data up with Lifetime Giving. The idea of course was to discover a new predictor variable or two. For example, it seemed likely that alumni with a 1 for the variable ‘Hung Up’ might have given less than other alumni. As it turned out, though, the individual variables I created on this occasion were not particularly predictive of giving (or of failing to give).

I certainly haven’t given up on the idea, though, because there is much room for improvement in the analysis. For one thing, I was looking for correlations with Lifetime Giving, when I should have specified Phonathon Giving. People who hang up on student callers aren’t non-donors, necessarily; they just don’t care for being contacted by phone. (Why they don’t just ask to be taken off the calling list, I’m not sure.)

In the meantime, this very basic text-mining technique DID prove very useful when I needed to compare two models I had created for our Annual Giving program. I had designed an improved model which specifically targeted phone-receptive alumni, in the hopes of reducing the number of hang-ups and other unpleasant phone encounters. I showed the effectiveness of this approach through the use of text mining, conducted exactly as outlined above. (I’ll detail the results in a future post.)

Do you have a lot of text-based comments in your database? Do you have a lot of text-based response data from (non-anonymous) surveys? Play around with mining that text and see what insights you come up with.

11 January 2010

The 15 top predictors for Planned Giving – Part 3

Okay, time to deliver on my promise to divulge the top 15 predictor variables for propensity to enter a Planned Giving commitment.

Recall the caveat about predictors that I gave for Annual Giving: These variables are specific to the model I created for our institution. Your most powerful predictors will differ. Try to extract these variables from your database for testing, by all means, but don’t limit yourself to what you see here.

In Part 2, I talked about a couple of variables based on patterns of giving. The field of potential variables available in giving history is rich. Keep in mind, however, that these variables will be strongly correlated with each other. If you’re using a simple-score method (adding 1 to an individual’s score for each positively-correlated predictor variable), be careful about using too many of them and exaggerating the importance of past giving. On the other hand, if you use a multiple regression analysis, these related variables will interact with each other – this is fine, but be aware that some of your hard-won variables may be reduced to complete insignificance.

Just another reason to look beyond giving history!

For this year’s Planned Giving propensity model, the predicted value (‘Y’) was a 0/1 binary value: “1″ for our existing commitments, “0″ for everyone else. (Actually, it was more complicated than that, but I will explain why some other time.)

The population was composed of all living alumni Class of 1990 and older.

The list

The most predictive variables (roughly in order of influence) are listed below. Variables that have a negative correlation are noted N. Note that very few of these variables can be considered continuous (eg. Class Year) or ordinal (survey scale responses). Most are binary (0/1). But ALL are numeric, as required for regression.

  1. Total lifetime giving
  2. Number of Homecomings attended
  3. Response to alumni survey scale question, regarding event attendance
  4. Number of President’s Receptions attended
  5. Class Year (N)
  6. Recency: Gave in the past 3 years
  7. Holds another degree from another university (from survey)
  8. Marital status ‘married’
  9. Prefix is Religious (Rev., etc.) or Justice
  10. Alumni Survey Engagement score
  11. Business phone present
  12. Number of children under 18 (from survey) (N)

Like my list of Annual Giving predictors, this isn’t a full list (and it isn’t 15 either!). These are the most significant predictors which don’t require a lot of explanation.

Note how few of these variables are based on giving – ‘Years of giving’ and ‘Frequency of giving’ don’t even rate. (‘Lifetime giving’ seems to take care of most of the correlation between giving and Planned Giving commitment.) And note how many variables don’t even come from our database: They come from our participation in a national survey for benchmarking of alumni engagement (conducted in March 2009).

8 January 2010

The 15 top predictors for Planned Giving – Part 2

It’s time to explore the two variables we created in Part 1. The first was ‘Years of Giving’, and the second was ‘Frequency of Giving’. Both of these things are generally assumed to be predictive of Planned Giving potential.

The key word is ‘assumed’. Based on assumption alone, you can go into your database right now and skim off the top alumni by years and frequency of giving, and call them your top Planned Giving prospects. That would fall into the category of data mining, and you might have some success doing this.

But why not kick it up a notch? If you can do data mining, you can do predictive modeling.

In modeling, characteristics such as years or frequency of giving are regarded as variables, just like ‘Class Year’, ‘Homecomings Attended’, ‘Business Phone Present’, and all the rest of them. And like any other variable, their relative power to predict Planned Giving potential is demonstrable.

Let’s explore ‘Years of Giving’ first.

For ease of visualizing this variable, I chopped it into ranges, as in the chart below. This chart shows how our Planned Giving expectancies (on the right) differ from all other alumni (on the left), with regards to the number of years they’ve made any gift.

Look at the blue parts (no giving) for both stacks: Our Planned Giving expectancies are far less likely than other alumni to be non-donors. That should not be a surprise.

Look at the purple parts (15 to 21 years of giving): Our expectancies are much more likely than all other alumni to give every year. Again, that’s perfectly in line with conventional wisdom. So far so good.

Our next chart shows the same side-by-side comparison for ‘Frequency’ of giving. This variable is quite closely related to ‘Years of Giving’, and we see the same dramatic differences. Our existing expectancies tend to be more frequent donors than the general alumni population.

There you have it, two solid characteristics associated with alumni who choose to enter Planned Giving commitments. These characteristics, and similar ones that might result from a standard RFM analysis (Recency, Frequency, Monetary value), might be enough to satisfy some.

But let me show you something else.

Here’s another side-by-side comparison. Now we’re looking at Homecoming Attendance. Have a look at this.

This is based on attendance data going back more than ten years. I am so glad we have that data, because as it turns out, Homecoming attendance is the second most powerful predictor of Planned Giving potential for our institution – after ‘Lifetime Giving’, but before any other variable related to giving history. Even more so: MULTIPLE Homecoming attendance!

I concede there is a significant age difference between these two groups – I did not take the extra step of limiting the population to older alumni when I made these charts. But the observation of difference between the groups still holds valid. (Only 16.4% of our living alumni from the Class of 1979 and earlier have ever attended Homecoming, but almost 56% of expectancies have.)

Prospecting for new Planned Giving commitments is hard enough. We make our jobs that much harder when we fail to add up the combined power of predictors such as event attendance and a dozen other things sitting in our databases.

If you remain unconvinced, if you still think that past giving behaviour is the only true predictor of future potential, then let me leave you with a final observation from my analysis of our own data: If all of our current, known Planned Giving expectancies were hidden in the database like needles in a haystack, and we were only allowed to use past giving patterns to find them again, we would miss two-thirds of them!

Ouch!

In Part 3, I will finally reveal my top 15 predictors of Planned Giving potential. I promise.

Older Posts »

The Silver is the New Black Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 975 other followers