CoolData blog

13 June 2012

Finding predictors of future major givers

Guest post by Peter B. Wylie and John Sammis

(Download a print-friendly .pdf version here: Finding Predictors of Future Major Givers)

For years a bunch of  committed data miners (we’re just a couple of them) have been pushing, cajoling, exhorting, and nudging  folks in higher education advancement to do one thing: Look as hard at their internal predictors of major giving as they look at outside predictors (like social media and wealth screenings). It seems all that drum beating has been having an effect. If you want some evidence of that, take a gander at the preconference presentations that will be given this August in Minneapolis at the APRA 25th Annual International Conference. It’s an impressive list.

So…what if you count yourself among the converted? That is, you’re convinced that looking at internal predictors of major giving is a good idea. How do you do that? How do you do that, especially if you’re not a member of that small group of folks who:

  • have a solid knowledge of applied statistics as used in both the behavioral sciences and “business intelligence?”
  • know a good bit about topics like multiple regression, logistic regression, factor analysis, and cluster analysis?
  • are practiced in the use of at least one stats application whether it’s SPSS, SAS, Data Desk, or R or some other open source option?
  • are actively doing data mining and predictive modeling on a weekly, if not daily basis?

The answer, of course, is that there is no single, right and easy way to look for predictors of major giving. What you’ll see in the rest of this piece is just one way we’ve come up with – one we hope you’ll find helpful.

Specifically, we’ll be covering two topics:

  • The fact that the big giving in most schools does not begin until people are well into their fifties, if not their sixties
  • A method for looking at variables in an alumni database that may point to younger alums who will eventually become very generous senior alums

 

Where The Big Money Starts

Here we’ll take you through the steps we followed to show that the big giving in most schools does not begin until alums are well into their middle years.

Step 1: The Schools We Used

We chose six very different schools (public and private, large and small) spread out across North America. For five of the schools, we had the entire alumni database to work with. With one school we had a random sample of more than 20,000 records.

Step 2: Assigning An Age to Every Alumni Record

Using Preferred class year, we computed an estimate of each alum’s age with this formula:

Age = 2012 – preferred class year + 22

Given the fact that many students graduate after the age of 22, it’s safe to assume that the ages we assigned to these alums are  slight to moderate underestimates of their true ages.

Step 3: Computing The Percentage of  The Sum of Lifetime Dollars Contributed by Each Alum

For all the records in each database, we computed each alum’s percentage of the sum of lifetime dollars contributed by all solicitable alums (those who are living and reachable). To do this computation, we divided each alum’s lifetime giving by the sum of lifetime giving for the entire database and converted that value to a percentage.

For example, let’s assume that the sum of lifetime giving for the solicitable alums in a hypothetical database is $50 million. Table 1 shows both the lifetime giving and the percent of the sum of lifetime giving for three different records:

Table 1: Lifetime Giving and Pecentage of The Sum of All Lifetime Giving for Three Hypothetical Alums

Just to be clear:

  • Record A has given no money at all to the school. That alum’s percentage is obviously 0.
  • Record B has given $39,500 to the school. That alum’s percentage is 0.079% of $50 million.
  • Record C has given $140,500 to the school. That alum’s percentage is 0.280% of $50 million.

Step 4: Computing The Percentage and The Cumulative Percentage of The Sum of Lifetime Dollars Contributed by Each of 15 Equal-Sized Age Groups of  Alums

For each of the six schools, we divided all alums into 15 roughly equal-sized age goups. These groups ranged from alums in their early twenties to those who had achieved or passed the century mark.

To make this all clear we have used School A (whose alums have given a sum of $164,215,000) as an example. Table 2 shows the:

  • total amount of lifetime dollars contributed by each of these age groups in School A
  • the percentage of the $164,215,000 contributed by these groups
  • the cumulative percentage of the $164,215,000 contributed by alums up to and including a certain age group

Table 2: Lifetime Giving, Percent of Sum of Lifetime Giving, and Cumulative Percent of Sum of Lifetime Giving for Fifteen Equal-Size Age Groups In School A

Here are some things that stand out for us in this table:

  • All alums 36 and younger have contributed less than 1% of the sum of lifetime givng.
  • For all alums under age 50 the cumulative amount given is just over 7% of the sum of lifetime givng.
  • For all alums under age 62 the cumulative amount given is less than 30% of the sum of lifetime givng.
  • For all alums under age 69 the cumulative amount given is slightly more than 40% of the sum of lifetime givng.
  • Well over 55% of the sum of lifetime givng has come in from alums who are 69 and older.

The big news in this table, of course, is that the lion’s share of  money in School A has come in from alums who have long since passed the age of eligibility for collecting Social Security. Not a scintilla of doubt about that.

But what about all the schools we’ve looked at? Do they show a similar pattern of giving by age? To help you decide, we’ve constructed Figues 1 – 6 that provide the same information as you see in the rightmost column of Table 2: The cumulative percentage of all lifetime giving contributed by alums up to and including a certain age group.

Since Figure 1 below captures the same information you see in the rightmost column of Table 2, you don’t need to spend a lot of time looking at it.

But we’d recommend taking your time looking at Figures 2-6. Once you’ve done that, we’ll tell you what we see.

These are the details of what we see for Schools B-F:

  • School B: Alums 48 and younger have contributed less than 5% of the sum of lifetime giving. Alums 70 and older have contributed almost 40% of the sum.
  • School C: Alums 52 and younger have contributed less than 5% of the sum. Alums 70 and older have contributed more than 40% of the sum.
  • School D: Alums 55 and younger have contributed less than 30% of the sum. Alums 70 and older have contributed almost 45% of the sum.
  • School E: Alums 50 and younger have contributed less than 30% of the sum. Alums 61 and older have contributed more than 40% of the sum.
  • School F: Alums 50 and younger have contributed less than 20% of the sum. Alums 68 and older have contributed well over 50% of the sum.

The big picture? It’s the same phenomenon we saw with School A: The big money has come in from alums who are in the “third third” of their lives.

One Simple Way To Find Possible Predictors of The Big Givers on The Horizon

Up to this point we’ve either made our case or not that the big bucks don’t start coming in from alumni until they reach their late fifties or sixties. Great, but how do we go about identifying those alums in their forties and early fifties who are likely to turn into those very generous older alums?

It’s a tough question. In our opinion, the most rigorous scientific way to answer the question is to set up a longitudinal study that would involve:

  1. Identifying all the alums in a number of different schools who are in the forties and early fifties category.
  2. Collecting all kinds of data on these folks including giving history, wealth screening and other gift capacity information, biographic information, as well as a host of fields that are included in the databases of these schools like contact information, undergraduate activities, and on and on the list would go.
  3. Waiting about ten or fifteen years until these “youngsters” become “oldsters” and see which of all that data collected on them ends up predicting the big givers from everybody else.

Well, you’re probably saying something like, “Gentlemen, surely you jest. Who the heck is gonna wait ten or fifteen years to get the answers? Answers that may be woefully outdated given how fast society has been changing in the last twenty-five years?”

Yes, of course. So what’s a reasonable alternative? The idea we’ve come up with goes something like this: If we can find variables that differentiate current, very generous older alums from less generous alums, then we can use those same variables to find younger alums who “look like” the older generous alums in terms of those variables.

To bring this idea alive, we chose one school of the six that has particularly good data on their alums. Then we took these steps:

We divided alums 57 and older into ten roughly equal size groups (deciles) by their amount of lifetime giving. Figure 7 shows the median lifetime giving for these deciles.

Table 3 gives a bit more detailed information about the giving levels of these deciles, especially the total amount of lifetime giving.

Table 3: Sum of Lifetime Dollars and Median Lifetime Dollars for 10 Equal Sized Groups of Alums 57 and Older

We picked these eight variables to compare across the deciles:

  • number of alums who have a business phone listed in the database
  • number of alums who participated in varsity athletics
  • number of alums who were a member of a greek organization as an undergraduate
  • number of alums who have an email address listed in the database
  • number of logins
  • number of reunions attended
  • number of  years of volunteering
  • number of events attended

Before we take you through Figures 8-14, we should say that the method we’ve chosen to compare the deciles on these variables is not the way a stats professor nor an experinced data miner/modeler would recommend you do the comparisons. That’s okay. We were aiming for clarity here.

Let’s go through the figures. We’ve laid them out in order from “not so hot” variables to “pretty darn good” variables.

It’s pretty obvious when you look at Fig. 8 that bigger givers, for the most part, are no more likely to have a business phone listed in the database than are poorer givers.

Varsity athletics? Yes, there’s a little bit of a trend here, but it’s not a very consistent trend. We’re not impressed.

This trend is somewhat encouraging. Good givers are more likely to have been a member of a Greek organization as an undergraduate than not so good givers. But we would not rate this one as a real good predictor.

Now we’re getting somewhere. Better givers are clearly more likely to have an e-mail address listed in the database than are poorer givers.

This one gets our attention. We’re particularly impressed with the difference in the number of logins for Decile 10 (really big givers) versus the number of logins for the lowest two deciles. At this school they should be paying attention to this variable (and they are).

This figure is pretty consistent with what we’ve found across many, many schools. It’s a good example of why we are always encouraging higher ed institutions to store reunion data and pay attention to it.

This one’s a no-brainer.

And this one’s a super no-brainer.

Where to Go from Here

After you read something like this piece, it’s natural to raise the question: “What should I do with this information?”  Some thoughts:

  • Remember, we’re not assuming that you’re a sophisticated data miner/modeler. But we are assuming that you’re interested in looking at your data to help make better decisions about raising money.
  • Without using any fancy stats software and with a little help from your advancement services folks, you can do the same kind of analysis with your own alumni data as we’ve done here. You’ll run into a few roadblocks, but you can do it. We’re convinced of that.
  • Once you’ve done this kind of an analysis you can start looking at some of your alums who are in their forties and early fifiteies who haven’t yet jumped up to a high level of giving. The ones who look like their older counterparts with respect to logins, or reunion attendance, or volunteering (or whatever good variables you’ve found)? They’re the ones worth taking a closer look at.
  • You can take your analysis and show it to someone at a higher decision-making level than your own. You can say, “Right now, I don’t know how to turn all this stuff into a predictive model. But I’d like to learn how to do that.” Or you can say, “We need to get someone in here who has the skills to turn this kind of information into a tool for finding these people who are getting ready to pop up to a much higher level of giving.”
  • And after you have become comfortable with these initial explorations of your data we encourage you to consider the next step – predictive modeling based on those statistics terms we mentioned earlier. It is not that hard. Find someone to help you – your school has lots of smart people – and give it a try. The resulting scores will go a long way toward identifying your future big givers.

As always: We’d love to get your thoughts and reactions to all this.

Advertisements

13 December 2009

Adding new variables to existing datafile in Data Desk, Part 2

Filed under: Data Desk, Derived variables — Tags: , , , , , — kevinmacdonell @ 8:04 pm

As explained in Part 1, you can simply paste a new variable into an existing relation IF there are the same number of cases in both the existing relation and the new variable, AND they are in exactly the same order. Otherwise you will have to perform a join, as follows.

As an example, let’s say my existing data set has variables for a unique identifier (ID), First Name, Last Name, and Class Year. And let’s say that for some reason, I need to add Degree as a variable. (For something simple like this, I would of course scrap this data file and re-query the database to get all the data I need; this is just an example.)

I’ll need to query the database for two things, ID and Degree. (Whenever you query your database, you must never fail to include a unique record identifier. In Banner this may be ID or PIDM.)

Copy the columns you want from Excel, Access or whatever your source is.

In Data Desk, go to the Data menu:  New > Relation. You’ll be pasting your new variables into a relation that is separate from the existing relation that contains the names and class years.

Name the folder (let’s call it “Deg“), and paste your variables. For neatness, drag the variables out of the Clipboard folder and into Deg, and drag the empty Clipboard to the trash.

To avoid confusing the new ID variable and the ID variable as already exists, rename the new variable. (eg. ‘ID-degree’)

Create a new derived variable in the Deg relation folder. Name it ‘ID-degree-lookup’, and give it this expression:

lookup('ID-degree','ID')

Remember to drag each of the two variables into the expression window rather than type them. The Lookup command will literally look up the value of the first argument in the list of values in the second argument.

Now create another derived variable, again in the Deg relation folder. Call it ‘Degree-joined’. Give it this expression:

getcase(textof('Degree'),'ID-degree-lookup')

The GetCase command will retrieve the value of ‘Degree’ that matches each looked-up ID value. The nested TextOf command is needed because the value of ‘Degree’ is non-numerical. If the variable we were adding was a numerical value, such as Lifetime Giving, the TextOf command would not be needed.

You can check if the variable works by choosing Show numbers from the variable’s Hyperview menu, or by dragging the variable into a frequency table.

‘Degree-joined’ is now a full member of both relations – you can drag it into the folder that contains your original data without getting an error message. It’s now available to be used like any other variable in the original data set.

If this seems a little confusing at first, don’t worry. Print these directions out and refer to them as needed. If you are building a predictive model and discovering new variables as you go along, you’ll be calling on this technique so often that eventually you’ll know it by heart.

11 December 2009

Adding new variables to existing datafile in Data Desk, Part 1

Filed under: Data Desk — Tags: , , , — kevinmacdonell @ 11:47 am

When I form a query on our database using MS Access, I try to make it as complete as possible before starting work in Data Desk. I ask myself, do I have all the columns (variables) I need for this task? If not, I modify and run the query again.

Invariably, though, I realize I’ve forgotten something only after I am well underway in Data Desk and starting from scratch is not an option. Or sometimes the data set has to contain so many variables (such as in any typical predictive modeling project) that there’s no way I can pull them out of the database with just one query (MS Access can handle only so much at once).

There is a lazy way to do this, which I do not recommend. You can simply paste a new variable into an existing relation, under these conditions:

  • The number of cases in your new variable is exactly the same as in your existing data file.
  • All the cases (‘rows’ in Excel) in the new variable are in exactly the same order as your existing data file.

These conditions can be easily met if your existing data and your new data is coming from the same static spreadsheet stored on your own hard drive. But a university database is a living thing. The number of ‘living, addressable alumni’ changes by the hour, so I can never be sure that any new variable I pull out of it is going to match up with the data I pulled yesterday.

Even if you ARE grabbing data from your own spreadsheet, what happens if you’ve reordered the rows in the meantime? Data Desk will allow you to paste the variable (because the number of cases is the same), but the data will be mismatched.

Fortunately it takes just a couple of steps in Data Desk to perform what I call a ‘merge’ or ‘join’ using the Lookup and GetCase functions. Sometimes I call it stitching data together – that’s what it feels like I’m doing.

I’ll show you how in: “Adding new variables to existing datafile in Data Desk, Part 2.

7 December 2009

Rural vs. Urban postal codes

Filed under: Data Desk, Derived variables — Tags: , , , — kevinmacdonell @ 11:18 am

If your data set contains primarily individuals residing in Canada, you can create a derived indicator variable in Data Desk to identify who does not live in an urban area or small town. Just one more potential predictor variable to test in your models.

First, your data set needs to contain Postal Code. You’ll also need to include Country, if your data set includes non-Canadian addresses.

The expression below will create an indicator variable with a value of ‘1’ (true) for rural Canadian addresses, and ‘0’ (false) for urban addresses.

Drag your own variables for Postal Code and Country into the expression window.

mid('Postal Code',2,1) = 0 AND textof('Country') = "Canada"

Lazy man’s derived variables: Class decade

Filed under: Data Desk, Derived variables — Tags: , , , , — kevinmacdonell @ 10:59 am

While using Data Desk, I find I need to create certain derived variables expressions over and over again. Instead of composing these anew every time, I save them as text files in a place quickly accessible on my computer.

Class Year is a variable I include in nearly every set of alumni data I work with. So one of the derived variables I create most often is Class Decade.

To use, simply copy this text and paste into the expression window of a new derived variable. Replace each ‘v’ with the Class Year variable, exactly as it is named in your data set. (Best to click and drag ‘Class Year’ into the expression window, instead of typing it every time.)

if len (v) = 0 then "n/a"
else if v < 1950 then "1940s + <"
else if v < 1960 then "1950s"
else if v < 1970 then "1960s"
else if v < 1980 then "1970s"
else if v < 1990 then "1980s"
else if v < 2000 then "1990s"
else if v < 2010 then "2000s"
else •

Blog at WordPress.com.