CoolData blog

16 May 2014

Find out “who knows who” in your database with SQL and Python

Filed under: Fun, Prospect identification, Python, SQL — Tags: , , , , — kevinmacdonell @ 12:30 pm

Figuring out connections is a big part of major gift fundraising. A development office might have a great list of prospective donors in hand, but those prospects will languish untouched if the right people who can make introductions are never found. While cultivating relationships with well-connected people is clearly important, research also plays a role in suggesting possible go-betweens. And to the extent that research involves sifting through data to find connections, it’s possible to automate some of the work.

Wouldn’t it be great to have software that asks you to name a prospective donor, and then spits out a list of people in your database who are most likely to be associates of that person? Sure it would!

Social media applications do a version of this, suggesting “people you may know,” based on shared connections and attributes such as schools attended, professional group memberships, employment information, and stated interests. I first encountered a higher-ed advancement application of this concept via Erich Preisendorfer of the University of New Hampshire. A while back, Erich explained his approach in a guest post for CoolData. (Read it here: Finding connections to your major gift prospects in your data.) I was recently in touch with Erich and he reports that the tool is still in use by prospect researchers at UNH and is still turning up valuable connections.

I tried to build something similar using our reporting tools, but it was a slow-lumbering beast, and not very user-friendly for the people most likely to be interesting in using it — prospect researchers, mainly. I never gave up on the idea of building something better, and finally on a quiet weekend a few months ago, I found two or three hours to mess around with a Python script. Given the ID of a person, this script makes multiple queries of the database and joins the results together to produce a ranked list of other people who might be associated with the input ID.

When the script is run, it prompts the user for an ID, then an Oracle database password. I’ll get into more detail later, but here is what the output looks like in the Python shell of IDLE, the development tool that I used to write the script, when I use my own ID as the one to find connections for. (I have blanked out the ID and password.)

connections_output copy

The script goes through these steps:

  1. It takes the ID provided by the user and queries the database to display the name associated with that ID (that would be me, in this example).
  2. It checks to see what events I attended, and produces a list (in memory) of all the IDs of constituents who attended those same events.
  3. It checks to see what activity codes I have attached to my record — varsity sports teams, student clubs and so on. I have none, but if I did, the script would return a list of all the people who shared the same activity codes. Again, this is saved in memory for use later on.
  4. The same routine is followed for employers (past and present), all class years, degrees, referring high school, cities extracted from mailing addresses (again, all past and present addresses), age, and database cross-references.
  5. All of these lists are combined into a single “object” in memory. The number of instances of each distinct ID is counted up. Every ID retrieved is a “hit”. The idea is, the more hits for any one retrieved ID, the more likely that ID belongs to someone who knows me (or knows the person who we are searching on).
  6. Because the resulting list is huge (more than 31,000 records just for me — and I have very little data attached to my record), the script asks the user to limit by a minimum number of hits. I’ve entered 3 here.
  7. The result is a file with 47 IDs. The script then does a final query on the database to retrieve the names and other details for these people, and saves the results as a .csv file that can be opened in Excel.

Here are the top few rows of the output. I have removed some columns for simplicity, and blanked out some personal information. The count_sum column is the “hit count”, and the list is sorted in descending order by count_sum. The person being searched on is always at the top of the list. This is not an especially impressive example, because I am not an alumnus. My list of likely connections is made up of co-workers and people from my hometown who happen to be the same age as me. The woman in the third row of the data was someone I had a crush on in junior high school but haven’t seen in 27 years … If I were a prospective donor, this search would be a bust. Clearly this method requires quality alumni data in order to be useful.



That’s how it functions. We will dig into the actual code shortly. But first, let me say that my primary aim is to share this with you as an idea. I can’t help you with installing Python or debugging or any of that. All I can do is show you my code and hope you can modify it to make it work for you. If you think this is cool and useful but don’t want to learn any code, I suggest you team up with someone from IT or your Advancement Services team, or a student or prof in the Computing Science department.

As I said, I’m using Python and a code library for handling data called pandas. (For some background, see my earlier posts: Getting bitten by Python and Automate RFM scoring of your donors with this Python script.) The great thing about pandas is that you can load data into in-memory structures called DataFrames, which are like data tables in that you can perform joins and sort, rank, de-dupe, add calculated columns, and so on.

A few closing comments before I show you the code:

  • Erich at UNH goes the extra step of matching on date range along with employer. I think a shared corporate connection is worth a point, but being at the same place of employment at the same time is obviously worth a lot more. If you have other ideas for things to match on, I’d love to hear them.
  • Those of you who aren’t into Python but know SQL can probably make an all-SQL version of this. I just like Python.
  • In the code, I have inserted comments to help guide you through. Everything to the right of a “#” is a comment.
  • The script includes multiple individual queries. You may wish to lump them together using UNION ALL, appending all the results sets together right in the body of the SQL. Up to you.
  • The queries that pull records based on shared events, degrees and so on all have the same basic structure: First, a WITH clause gathers the codes for the events (or degrees or cities) that the target ID has attached to his or her record, then a SELECT statement finds all the other IDs who have those same codes attached to their records.
  • In order to break the SQL up into multiple lines, I have enclosed the string between groups of three single quotes. You’ll see that whenever I have to insert a variable into the SQL (such as ‘target_pidm’), I have to use what looks like a confusing jumble of single and double quotes — but the ordering is very precise.
  • I was going to simplify the SQL for you, but didn’t see the point. One way or the other, you will still have to modify it. So it’s much the same as what I actually use. You’ll also notice that I tend not to use aliases for tables and views, which makes the queries text-heavy. Sorry.
  • One limitation of this tool is that it will run only on a machine that has a Python installation. For that reason I haven’t been able to simply give it to someone (in Prospect Research, say) and have them run it themselves. The next step might be bundling it as a stand-alone executable file (an .exe) that could be shared with others, but I haven’t gotten that to work yet.

AND, one final, final comment: This is probably the last piece on Python and database querying I’m going to write for a while. In future posts I will tone down the technical stuff and get back to talking about data and predictive modelling.


# Import required code libraries


import pandas as pd

import numpy as np

import as psql

import cx_Oracle, os



# User prompts, and connect to Oracle database


ID_input = raw_input('Enter ID to find connectors for: ')


user = raw_input('Enter username for database: ')


password = raw_input('Enter password for database: ')


connectstr = user + '/' + password + '@database:0000'


connection = cx_Oracle.connect(connectstr)



# Query to get the database PIDM and name for the entered ID.


SQL = '''












AND BANINST1.AA_ENTITY.ID=''' + "'" + target + "'" + ''')'''


df = psql.frame_query(SQL, con=connection)


name = str(df['FIRST_NAME']+' '+df['LAST_NAME'])


print('Searching for connections to: '+ name)


target_pidm = max(df['PIDM'])    # PIDM, not ID, is the primary key in my database.



# Find events


SQL = '''

WITH events AS (






WHERE BANINST1.AA_EVENTS.PIDM = ''' +str(target_pidm)+ ''')













events = psql.frame_query(SQL, con=connection)


print("Found " + str(len(events)) + " people who attended the same events.")



# Find activities


SQL = '''WITH activities AS (








BANINST1.AA_ACTIVITY_AND_LEADERS.PIDM = '''+str(target_pidm)+''')






FROM activities,







activities = psql.frame_query(SQL, con=connection)


print("Found " + str(len(activities)) + " people with the same Activity codes.")



# Find employers


SQL = '''WITH employers AS (








BANINST1.AA_ALL_EMPLOYMENT.PIDM = '''+str(target_pidm)+''')













employers = psql.frame_query(SQL, con=connection)


print("Found " + str(len(employers)) + " people with the same Employers.")



# Find class years


SQL = '''WITH classes AS (








BANINST1.AA_DEGREE.PIDM = '''+str(target_pidm)+''' )













classes = psql.frame_query(SQL, con=connection)


print("Found " + str(len(classes)) + " people with the same class year(s).")



# Find degrees


SQL = '''

WITH degrees AS (








BANINST1.AA_DEGREE.PIDM = '''+str(target_pidm)+'''














degrees = psql.frame_query(SQL, con=connection)


print("Found " + str(len(degrees)) + " people with the same degree(s).")



# Find high school


SQL = '''

WITH high_school AS (








BANINST1.AA_HIGH_SCHOOL.PIDM = '''+str(target_pidm)+'''















hs = psql.frame_query(SQL, con=connection)


print("Found " + str(len(hs)) + " people from the same high school.")




# Find cities


SQL = '''

WITH cities AS (








SATURN.SPRADDR.SPRADDR_PIDM = '''+str(target_pidm)+'''















cities = psql.frame_query(SQL, con=connection)


print("Found " + str(len(cities)) + " people with addresses in the same city/cities.")



# Find age


SQL = '''

WITH age AS (








BANINST1.AA_ENTITY.PIDM = '''+str(target_pidm)+'''















age = psql.frame_query(SQL, con=connection)


print("Found " + str(len(age)) + " people of the same age.")



# Find cross-references


SQL = '''

WITH xref AS (








ALUMNI.APRXREF.APRXREF_PIDM = '''+str(target_pidm)+'''















xref = psql.frame_query(SQL, con=connection)


print("Found " + str(len(xref)) + " people who share the same cross-references.")




# Concatenate all results


print("Joining results.")


results = pd.concat([events, activities, employers, classes, degrees, hs, cities, age, xref])


# Remove unneeded columns




print("Found " + str(len(results)) + " total results.")


## Count up the number of times each PIDM appears in the results. (A few steps here.)


# Add a column called ‘count’ and populate every row with a ‘1’


results['count'] = 1


# Group by PIDM and sum on ‘count’ to get ‘hits per PIDM’.


results = results.join(results.groupby(results['PIDM'])['count'].sum(), on=['PIDM'], rsuffix='_sum')


# De-dupe rows in order to get one result per PIDM.


results = results.drop_duplicates()


# Sort results in descending order by the number of hits.


results.sort(columns='count_sum', axis=0, ascending=False, inplace=True)


# The file is potentially huge, so prompt the user to limit it by hit count.


cutoff = raw_input('How many hits would you like to limit by? ')


# Keep rows with minimum number of hits.


results = results[results['count_sum'] >= int(cutoff)]


# Get rid of the ‘count’ column - not needed.


results = results.drop(['count'], axis=1)


print("Found " + str(len(results)) + " unique results, with at least "+cutoff+" hits.")



## Join names to results


print('Preparing results and saving.')


SQL = '''






















results = pd.merge(results, psql.frame_query(SQL, con=connection), how='left', on=['PIDM'], sort=False)


results.to_csv(target + ' - ' + name + '.csv')


print('All done!')





18 February 2014

Save our planet

Filed under: Annual Giving, Why predictive modeling? — Tags: , , — kevinmacdonell @ 9:09 pm

You’ve seen those little signs — they’re in every hotel room these days. “Dear Guest,” they say, “Bed sheets that are washed daily in thousands of hotels around the world use millions of gallons of water and a lot of detergent.” The card then goes on to urge you to give some indication that you don’t want your bedding or towels taken away to be laundered.

Presumably millions of small gestures by hotel guests have by now added up to a staggering amount of savings in water, energy and detergent.

It reminds me of what predictive analytics does for a mass-contact area of operation such as Annual Giving. If we all trimmed down the amount of acquisition contacts we make — expending the same amount of effort but only on the people with highest propensity to give, or likelihood to pick up the phone, or greatest chance of opening our email or what-have-you — we’d be doing our bit to collectively conserve a whole lot of human energy, and not a few trees.

With many advancement leaders questioning whether they can continue to justify an expensive Phonathon program that is losing more ground every year, getting serious about focusing resources might just be the saviour of a key acquisition program, to boot.

2 December 2010

Call attempt limits? You need propensity scores

Filed under: Annual Giving, Phonathon, Predictive scores — Tags: , , — kevinmacdonell @ 4:51 pm

A couple of weeks ago I shared some early results from our calling program that showed how very high-scoring alumni (for propensity to give by phone) can be counted on to give, and give generously, even after multiple attempts to reach them. If they have a high score, keep calling them! Yes, contact rates will decline, for sure. But these prospects are still likely to give if you can get them on the phone, making the extra effort worthwhile.

For the other three-quarters of your prospects, it’s a different story. You may still want to call them, but keeping those phones ringing all year long is not going to pay off, even if you have the luxury of being able to do so.

This is ground I’ve already covered, but I think it bears repeating, and I’ve created some charts that illustrate the point in a different way. Have a look at this chart, which shows pledge percentage rates for the 6th, 7th, 8th, 9th and 10th decile score, at four stages of call attempts:

This chart is based on data from more than 6,600 phone conversations. How are we to interpret it? Let’s start with the top line, in blue, which represents prospects in the top 10% (decile) of alumni for propensity to give by phone, as determined by the predictive model:

  • Almost 38% of 10th-decile alumni who were contacted on the very first call attempt made a pledge.
  • Moving to the next dot on the blue line, we see that almost 37% of the 10th-decile alumni who were contacted on the 2nd or 3rd attempt made a pledge.
  • The pledge rate slips a little more, to 36%, if the prospect picked up the phone on attempts 4 through 7.
  • And finally, almost 26% of them pledged if it took more than 7 attempts to reach them.

That’s the first line. The other lines take different paths. The 9s and 8s start much lower than the 10s, but pledge percentages actually rise with the number of call attempts. They will fall back to earth — just not yet! As for the lower deciles, the 7s and 6s, they start relatively low and dwindle to zero.

So what does all this tell me? I am less interested in how each decile ranks at the start of calling (one or two attempts), because it’s no surprise to me that the 10th decile gives at twice the rate as the 9th decile, and that pledge rates fall with each step down in the score. I’ve seen that before.

What really interests me is what happens when we’ve made many repeated attempts to call. That the 8s and 9s have pledge rates that increase with the number of call attempts is pretty strange stuff, but the fact is: 26 alumni with a score of 9 made a pledge only after we called them 8 or 9 or maybe 15 times.

Whether it’s worth it to make that many call attempts is up to you. It depends on contact rates, and what it costs to make all those calls. But one thing is certain: If I’m going to call repeatedly, I’d better be calling the top three deciles, because if I keep on flogging the segments with a score of 6, I’m not going to do very well.

So what about contact rates?

Here’s another chart that shows what percentage of each score decile’s prospects have been successfully reached at the same four levels of call attempts. (Click on chart for full size.)

What does it mean? Compare the lowest decile called so far (Decile 6) with the highest decile (10). About 14% of 6s answered the phone on the first try, compared with about 19% of the 10s. That’s not a big difference: In fact, contact rates are similar across the scores for the first attempt. But the similarity ends there. After the first attempt, the lower scoring alumni have steadily decreasing rates of contact. The same is true of the higher-scoring alumni, but the difference is that some of them are still answering their phones on the 8th call. More than 4% of 10s were reached on the 8th call or greater.

The bottom line is, the propensity score is your biggest asset in setting appropriate call attempt limits. Yes, Renewal prospects are more likely to give than Acquisition prospects. But that’s not enough to go by. Are you going to call every last Renewal prospect before thinking about acquiring new donors? I wouldn’t recommend it — not if you care about long-term growth and not just this year’s totals. And because contact rates decline as attempts increase (regardless of score), you’re going to end up making a LOT of phone calls to find those gifts that will make up your goal.

My earlier post on the same subject is here. I am spending a lot of time on this, because I don’t see any of this being written about by the well-known experts in Phonathon fundraising. Why that is, I do not know.

25 November 2010

Turning people into numbers?

Filed under: Front-line fundraisers, skeptics — Tags: , — kevinmacdonell @ 1:18 pm

(Image used via Creative Commons license. Click image for source.)

I tend to hear the same objections from presentation audiences, my own and others’. They’re not objections so much as questions, and very good questions, and always welcome. But no one yet has voiced a reservation that I know some must be thinking: This predictive modeling stuff, it’s all so … impersonal.

We already work in a profession that refers to human beings as “prospects” and “suspects”. Doesn’t sticking scores and labels on people perpetuate a certain clinical coolness underlying how fundraising is carried out today? Predictive modeling sounds like bar-coding, profiling, and commodifying people as if they were cattle destined for the table. Maybe we can be so busy studying our numbers and charts that we lose our connection with the donor, and with our mission.

Apologies in advance for setting up a straw man argument. But sometimes I imagine I see the thought forming behind someone’s furrowed brow, and wish it would be brought into the open so we can discuss it. So here we go.

(First of all, how many fundraising offices do you know that carry out their work with “clinical coolness”? We should be so lucky!)

More seriously: Data mining and predictive modeling will never interfere with the human-to-human relationship of asking for a gift, whether it’s a student Phonathon caller seeking an annual gift, or a Planned Giving Officer discussing someone’s ultimate wishes for the fruit of a lifetime of work. It’s a data-free zone.

What predictive modeling does is help bring fundraiser and would-be donor together, by increasing the odds (sometimes dramatically increasing the odds) that the meeting of minds will successfully converge on a gift.

Here’s how I frame it when I talk about predictive modeling to an audience that knows nothing about it. If all we know about a constituent is their giving history (or lack of it), we’re treating everyone the same. Is one non-donor just as likely as another to be convinced to make an annual gift? Is one $50-a-year donor just as likely as another to respond to an appeal to double their pledge this year, or be receptive to having a conversation with a Planned Giving Officer?

The answers are No, No, and NO!

What I say is, “Everyone is an individual.” If they played sports as a student, if they lived on campus, if they attended an event — we can know these things and act accordingly, based on what they tell us about their engagement with our institution. We just have to tune in and listen.

“Everyone is an individual.” Catchy, eh? Well, it’s trite, but it’s true — and it’s no less true for data miners than it is for anyone else.

2 November 2010

Finding prospects flying under the radar: A nuts and bolts approach

Guest post by Peter Wylie and John Sammis

(Downloadable/printable version available here as a PDF: Flying Under the Radar)

Let’s say you’re a prospect researcher in higher education.  You’re getting some pressure – from your boss, from some of the gift officers you work with, maybe the campaign director – to come up with a list of new prospects. They use different words, but their message is clear:

“We’ve picked the low hanging fruit. We don’t want to keep going back to the same alums who’ve been helping us out in a big way for a long time. We need to find some new people who have the capacity and willingness to make a nice gift. Maybe not a huge gift, but a nice gift.”

If you’ve been working in the field awhile, this isn’t the first time you’ve faced this problem, nor is it the first time somebody has offered advice on how to solve it. Truth be told, you may have gotten too much advice:

  • “You haven’t done a screening for five years. You need to do a new one.”
  • “Our company has gotten very sophisticated about predictive modeling as well as gift capacity ratings. Use us.”
  • “You’re not using social media resources effectively. Facebook and MySpace are great places to find out about alums who have lots of financial resources and are philanthropically inclined.”
  • “You need to learn how to do data mining and predictive modeling or add somebody to your staff who already knows how to do it.”

It’s not that any of this advice is bad, even if it comes from a vendor whose goal is to get some of your business. The problem is that you, or the person you report to, has to sift through this advice and make some kind of decision — even if that decision is to do nothing different from what you’re currently doing.

Since John Sammis and I are some of the people out there offering this kind of advice to advancement folks, we often ask ourselves: “Are we making things too complicated for the people we’re trying to help?” Often the answer we come up with is, “Probably.” Why? That’s a whole can of worms we’d rather not get into. The short answer is that both of us grew up in an educational system where precious few of our teachers and authors of our textbooks were good at making things simple and clear. And like it or not, we’ve inherited some of their tendencies to obfuscate rather than elucidate. But we fight against it as best we can.

Hopefully we’ve won that battle in this piece. (You’ll decide if we have.) Anyway, what we’ve done here is use some data from a large public higher education institution to walk you through a simple process for finding new prospects.

Before we do that, let’s start off with three assumptions:

  • You have fairly recent gift capacity ratings for several thousand of your solicitable alums, some of whom you think may be good, untapped prospects.
  • You have ready access to someone who can develop a simple score for all those alums with respect to their affinity to give to your school.
  • You have reasonably good profiles on each of these alums. That is, those profiles include information like lifetime hard credit dollars given; date and amount of last gift; date and amount of first gift; what gift officers have ever been assigned to those alums and when; the most recent occupation of the alum; and so on.

Here are the steps we want to take you through:

1.     Look at the distribution of gift capacity ratings for the alums you’ve recently screened.

2.     Look at the giving data for these alums by gift capacity ratings.

3.     Have someone build you a simple affinity model using some very basic information stored on each alum.

4.     Pick a small group of alums who have a high capacity rating, a high affinity rating, and are not currently assigned to a gift officer.

5.     Look closely at the alums in this small group and identify some who may deserve more scrutiny.

We’ll go through each of these steps in detail:

Look at the distribution of gift capacity ratings for the alums you’ve recently screened.

Whenever you have a field of data (whether it comes from your own database or has been delivered to you by a vendor), it’s a good idea to make a frequency distribution of the field. (In statistics the term is “variable,” not “field,” so from here on out we’ll say “variable.”)

Here are a couple of reasons for doing this:

  • You get a big picture look at the variable. Our experience is that most people in higher education advancement don’t do this for the many variables they have in their alumni databases. For example, let’s say you asked the average associate vice president for advancement in a college or university this question: “What percentage of your solicitable alums have given $100 or less lifetime hard credit?” Our bet is that the vast majority would have no idea of what the correct answer was; moreover, they would be shocked if you told them.
  • You get a chance to see if there is anything out of the ordinary about the data that’s worth further exploration. Here’s a good example. When doing predictive modeling for a school, we look closely at the variable “preferred class year.” It’s a measure of how long alums have been out of school, and it’s a reasonably good measure of age. It’s not at all uncommon for us to encounter thousands of records coded as “0000” or, say, “1700.” Call it a hunch, but we’re pretty sure those folks didn’t graduate the year Christ was born, nor 75 years or so before the Declaration of Independence got signed. When we encounter a problem like this, of course, we ask the advancement services people we’re working with what those codes mean. The answers vary. Sometimes such codes indicate alums who are non-degreed. Sometimes they indicate alums who simply received a certain kind of certificate. Or they indicate something else. The important thing is that we ask; we clear up the mystery.

All right, Table 1 shows a distribution of the gift capacity ratings for a group of about 22,000 alums in the public higher education institution we mentioned earlier. Figure 1 displays the same distribution graphically. Take a minute or two to look at both of them. Then you can compare what you see with what we noticed.

Table 1: Estimated $Gift Capacity for Over 22,000 Alumni Divided Into 20 Groups of Roughly 5% Each

For us, two things about these data stand out:

1.     Some of the data are a little hard to believe. Let’s take a look at Group 1 in Table 1. There are 1123 records in this group. They comprise alums with the lowest five percent of gift capacity ratings. If you look at the “min” column, you’ll see that the lowest gift capacity rating is one dollar. Really? That alum must be down on his or her luck. You can’t see all the data in this distribution the way we can, but there are a total of 11 alums whose gift capacity is listed as being under $100. Obviously, you should be suspicious of such ratings. Contacting the vendor who generated them is a must. And politely staying after them until you get an acceptable answer is the right thing to do.

2.     The capacity ratings rise slowly until we get to the top ten percent of alums. There’s nothing particularly surprising about this. However, it is interesting (without showing you all the arithmetic) that, of the roughly one billion dollars of total gift capacity for these alums, over half a billion of that gift capacity resides with the top 10% of the alums.

Look at the giving data for these alums by wealth capacity ratings.

We’ve taken a look at the distribution of gift capacity ratings for the alums we’ve screened. Now let’s look at how those capacity ratings are related to the money the same alums have given to the school.

We’ll start with Table 2. The two columns on the right of the table (“Total$ given” and “Max$ given”) contain the most important pieces of information in the table. The “Total” column simply shows the total lifetime dollar amount given for the alums at each of the 20 gift capacity levels. The “Max” column shows the maximum amount any one alum has given at each of these levels.

Table 2: Giving Data for Over 22,000 Alumni Divided Into 20 Groups of Roughly 5% Each by Gift Capacity

We see a pattern that emerges from this table, but it’s a little hard to detect. So go ahead and take a look at Table 3 and Figure 2. Then we’ll offer our thoughts.

Table 3: Percentage of Alums Giving $50 or More Lifetime by Gift Capacity Level

When we look at these two tables and this one figure, two conclusions emerge for us:

1.     There is some relationship between gift capacity and giving, but it’s not a strong one.

2.     If we can believe the gift capacity ratings, there is a huge amount of untapped potential for giving, especially at the highest capacity levels.

Let’s start with the first conclusion, that there is not a strong relationship between capacity and giving. How do we arrive at the conclusion? Let’s go back to Table 2. Now if we just look at the five percent of alums with the lowest giving capacity (Group 1) and the five percent of alums with the highest giving capacity (Group 20), we see that the total lifetime giving goes from $34,062 to $2,396,810. That’s a big difference. The wealthiest alums have given about 70 times as much as the least wealthy alums. Also, the most generous alum in the lowest capacity group has given a lifetime total of $2,005 compared to the most generous alum in the highest capacity group who has given a lifetime total of $224,970. Again, we see a big difference.

But look at what happens in between these two extremes. Things bounce around a lot. For example, let’s compare the giving between capacity level 3 and capacity level 12. The total giving amount for the former group is $152,741; the total giving amount for the former group is $125,477. In other words, alums with a much higher giving capacity have given less than alums with a much lower giving capacity.

Further evidence of this “bouncing around” is apparent when you look at Figure 2 (a graphic version of Table 3). This chart shows the percentage of alums at each of the 20 giving capacity levels who have given $50 or more lifetime to the school. Notice how these percentages dip in the middle of the capacity range.

So let’s go back to our conclusion that there is some relationship between gift capacity and giving, but that it’s not a strong relationship. Yes, the overall trend of giving goes up with gift capacity, but we can in no way conclude that knowledge of an alum’s gift capacity is a good indication of how much he/she has given.

Okay, how about our second conclusion that there is a huge amount of untapped potential for giving, especially at the highest capacity levels? We think Figure 2 provides plenty of support for that conclusion. Look at the highest gift capacity level. Barely 50% of the alums in this category have given over $50 lifetime. Not as a single gift. No. Lifetime.

If that doesn’t convince you of the untapped potential for giving among such wealthy alums, we’re not sure anything will.

Have someone build you a simple affinity model using some very basic information stored on each alum.

Now comes the tricky part. Now comes the part where we risk losing you because we get a little too technical. We don’t want to do that. We want to avoid having you end up saying, “Geez, these guys said they were gonna make this simple, but they didn’t. Now I’m more confused than I was before I started reading this thing.”

This is not a perfect solution to the problem, but we think it might work. We’d like you to find someone who works at your school who can help you. Of course, it would be great if you already had someone on your advancement staff who fits that bill – someone whose job is focused on data mining and predictive modeling. Some schools have folks like that, but most don’t. (We’re assuming you don’t, otherwise there wouldn’t be a whole lot of need for you to be reading this piece.)

Anyway, the person you’re looking for is probably a stats professor in the psychology or education department, a graduate student pursuing a degree in that area, or someone who works in what is often called “institutional research.” Ideally, the person you find should be:

  • Someone who is helpful and accommodating. This seems obvious, but, sadly, a lot of people in higher education don’t meet these criteria. Maybe a quick and easy way to decide is to ask yourself: “If I walked into a high end department store, is this a person I would want to help me?” If your answer is not an unequivocal “yes,” you should keep looking.
  • Someone who is good at explaining things in clear, simple English. Candidly, a lot of people in the technical arena are not good at this. As we said earlier, the two of us (try as we might) struggle with making things clear to the people we work with. What we’d suggest is that you look for someone who is patient with you when you don’t understand something they say. You especially want to avoid someone who acts the least bit impatient and condescending if you don’t “get” something the first time they explain it.
  • Someone who knows at least a little about major giving. The person does not need to be an expert in prospect research. But he or she should have a sense of how an advancement office works and some of the pressure that prospect researchers and development officers endure, especially when they’re scrambling to meet campaign goals.
  • Someone who has good skills with a stats software package. We think this is a must. If the person only knows how to use Excel to analyze data, that’s not good enough. The person needs to be proficient in a package like SPSS, which is widely available on college and university campuses. John and I prefer a package called Data Desk, but the important point is that your person needs to be proficient with an application whose purpose is sophisticated data analysis.
  • Someone who’s had some experience with multiple regression. You may or may not have heard of multiple regression. Don’t worry about that. Just be able to confirm that the person who helps you has a solid working knowledge of the technique. A good way to find out if that’s the case is to ask the person to explain the technique using a simple example using some of your own data.

Let’s assume you’ve found someone to help you. As we said earlier, if you follow our plan, that person will build you a simple affinity model using some very basic information stored on each alum for whom you have a capacity rating.

For the benefit of that person, we’ve described below how we developed the model for the school we’re using as an example. We’ve tried to provide just enough detail to give your person a guide, but not so much that we bog the paper down with too many words.

Enclosed in the boxes below (so you can skip over it if you wish) is a summary of what we did:

We chose lifetime hard credit giving as our dependent variable. To each record we added one dollar of giving to arbitrarily rid the sample of zero givers. We then performed a log to the base 10 transformation on this variable to reduce as much of the positive skewness as possible. 

We chose the following predictors (independent variables) for entry into our multiple regression analysis:

  • MARITAL STATUS MISSING (the alum was given a 1 if there was no marital status listed for him/her in the database, otherwise a 0)
  • MARITAL STATUS SINGLE (the alum was given a 1 if he/she was listed as “single” in the database, otherwise a 0)
  • CLASS YEAR (the alum’s preferred year of graduation)



  • HOME PHONE LISTED (a 1 if a home phone was listed in the database for the alum, otherwise a 0)
  • BUSINESS PHONE LISTED (a 1 if a business phone was listed in the database for the alum, otherwise a 0)
  • EVENT ATTENDED (a 1 if an alum was listed as ever attending an event after graduation, otherwise a 0)
  • E-MAIL LISTED (a 1 if an e-mail address was listed in the database for the alum, otherwise a 0)

Table 4 summarizes the results of the regression analysis:

Table 4: Regression Analysis Table for the Simple Model Developed for This Paper

R squared = 24.9%     R squared (adjusted) = 24.9%
s =  0.9835  with  22446 – 9 = 22437  degrees of freedom
Source Sum of Squares df Mean Square F-ratio
Regression 7213.27 8 901.659 932
Residual 21701.5 22437 0.967218
Variable Coefficient s.e. of Coeff t-ratio prob
Constant -1603.28 218.2 -7.35  ≤ 0.0001
MARITAL STATUS MISSING -0.333961 0.01845 -18.1  ≤  0.0001
MARITL STATUS SINGLE -0.472877 0.01603 -29.5  ≤  0.0001
HP LISTED 0.243367 0.01496 16.3  ≤  0.0001
BP LISTED 0.685641 0.03374 20.3  ≤  0.0001
CLASS YEAR 1.64819 0.2196 7.51  ≤  0.0001
SQUARE OF CLASS YEAR -4.23E-04 5.52E-05 -7.66  ≤  0.0001
EVENT ATTENDED (YES/NO) 0.712603 0.05089 14  ≤  0.0001
EMAIL LISTED 0.422934 0.01487 28.4  ≤  0.0001

We divided the predicted scores from the regression for alums with the highest gift capacity into twenty roughly equal-sized groups where 1 was low and 20 was high.

Okay, where are we here? In the “boxed in” technical suggestion above, the last thing we said was: “We divided the predicted scores from the regression analysis for alums with the highest gift capacity into twenty roughly equal-sized groups where 1 was low and 20 was high.” Well, what does that actually mean?

Let’s start with the specific group of alums we’re most interested in looking at. These are the 1,123 alums who got the highest gift capacity ratings. If you go all the way back to Table 1 (which you don’t really need to do), you’ll see that their total gift capacity is $405,958,000 – a lot of money.

Our regression analysis created a very granular affinity score for this group. It had 408 different levels. The alums with the lowest of these scores (according to the regression analysis) are least likely to give a lot of money to the school; the alums with the highest of these scores are the most likely to give a lot of money to the school.

That’s terrific, but 408 score levels is a lot of levels to get your arms around. So what we did is take those scores and chop them up into 20 roughly equal sized groups from 1 to 20, and (again) 1 represents the lowest scores; 20 represents the highest scores. Detailed giving data on all these 1,123 alums is displayed in Table 5 below. We can look at those data in a moment, but let’s move on to our next step.

Pick a small group of alums who have a high capacity rating and a high affinity rating.

Table 5 gives us lots of information about where we’re likely to find this small group. Let’s see what looks interesting here. Remember, everyone in this total group of 1,123 alums has a gift capacity rating greater than $116,000. This is a wealthy bunch of folks – no question about that.

We’ll start with the lowest group, group 1. These 56 alums have the lowest affinity scores of the total group, and their giving data confirms that. Look at the value for these alums in the “sum” column: $430. That means that all 56 alums, as a group, have given less than $500 lifetime to the school. That works out to a mean (average) lifetime gift of less than $8 per alum. Our conclusion? These folks may be wealthy, but both their affinity score and their history of giving have them speaking loud and clear: “Our philanthropic interests are aimed at worthy causes other than our alma mater.”

Now let’s jump up to the top group, group 20. Notice that there are exactly the same number of alums in this group as in group 1 (56), but the giving data for this top group is quite different from the bottom group. Most notably, they’ve given a total of $483,789, well over a thousand times as much as the bottom group. So here we have a group of alums who (a) we know are wealthy; (b) have a high affinity rating developed from the regression analysis; and (c) have already given the school quite a bit of money.

Table 5: Giving Data for Over 1,123 Very High Gift Capacity Alumni Divided Into 20 Groups of Roughly 5% Each by Affinity Score

Look closely at the alums in this small group and identify some who may deserve more scrutiny.

Now we can take a very close look at this group in Table 6 (below, near the end of this post). It lists the total giving and gift capacity for each of these 56 records. (Remember, each of the 56 alums has a high gift capacity rating, and each has an affinity score that says they really like the school.)

We’ll start off with a couple of alums who have already given a considerable amount to the school. What’s particularly interesting about these two is how different they look from the perspective of the possibility of very large future gifts.

  • Record #1: From the looks of things, this person is probably well known to the research staff and to the gift officers. The person has given more than $100,000 and has a gift capacity that’s not a whole lot more than that amount. We’re pretty sure the school would like to have a lot more alums like this one.
  • Record #7: We find this one pretty interesting. The alum has given a bit over $21,000 lifetime, but their gift capacity is listed as well over $13 million. Since the alum clearly likes the school, and they have considerable wherewithal to give a lot more, why haven’t they given a lot more? Maybe there’s a good reason, maybe not. At the very least this is someone who deserves continued attention both from the prospect research side of the house and the gift officer side of the house.

Now we’ll move down to five alums (Records #15, 17, 18, 20, and 24) all of whom have given less than $6,000 lifetime but whose gift rapacity ratings all exceed $400,000. Here we are probably in the neighborhood of prospects who truly are flying under the radar. They may have been assigned to a gift officer. And when a prospect researcher looks at their profiles, the researcher may say, “Yeah, we know about him.” But our experience tells us that alums like these are worth a harder look. For example, we would ask:

  • Is the alum really assigned to a gift officer, or did the last gift officer simply write the alum off as not a “good prospect” with no good documentation as to why that decision was made?
  • What does the alum do for a living? Does that occupation (e.g., investment banker) jibe with the gift capacity rating?
  • Has the alum been an active volunteer or season ticket holder?
  • Is he or she at the age where a sizeable planned gift might be a possibility?

You get the idea. With folks like these we think you should dig a little. Some of them may be at what Malcolm Gladwell calls “the tipping point.” They may be right on the verge of making a much larger gift if you do a little more research on them and send the right gift officer out to meet with them.

By the way, take a look at Record #56. This person is really rich, the internal data says he/she really likes the school, but this person hasn’t given any money. We’d sure like to know the story about this person.

At the ending of Table 6 we offer some closing comments. We really appreciate your staying with us up to this point.

Table 6: Giving and Gift Capacity Data for All 56 Alums in the Highest Affinity Group

Some Closing Comments

We’ve put a lot of tables and charts in front of you. That’s a lot of information to absorb. Several thoughts that might be helpful:

  • If you found what we’ve said here intriguing but also a bit confusing, put the piece away for a few days. Then take another look at it. It should be clearer the second time around. If it’s not, please feel free to contact John Sammis at and Peter Wylie at
  • Share what we’ve written here with a colleague whose opinion you respect but who disagrees with you about a lot of things. That should make for an “interesting” discussion.
  • Whatever you do, we hope this piece encourages you and others in your advancement group to take a closer look at all the data you store on your alumni. The two of us will never back away from the importance of doing that when you’re trying to save money on appeals and generate more revenue for worthwhile projects.

14 July 2010

Does “Do Not Solicit” mean they won’t give?

Filed under: Alumni, Annual Giving, Peter Wylie — Tags: , , , — kevinmacdonell @ 8:19 am

Guest post by Peter Wylie

(Click here: Do Not Solicit – 071310 to download Microsoft Word version of this paper.)

I think studying higher education alumni databases is a lot like studying a foreign Language. Take French. I’ve been a student of it off and on for over 50 years. I’ve gotten pretty good at it. So much so that native speakers often compliment me on my fluency. When they do, I thank them graciously and quickly add “Plus je sais, plus je ne sais pas.” The more I know, the more I don’t know.

For over a decade I thought I “knew” that alumni who tell their alma maters not to ask them for money meant two things: (1) Don’t ask me, and (2) I’m not going to give you anything. Then a good friend and colleague sent me some data for a project that may end up being pretty cool. Of the 50,000 or so records she sent, over 7,000 had a “do not solicit” tag. Okay. That’s clear. Don’t call ’em. Don’t send ’em letters or e-mails. And definitely don’t go knocking on their doors just because you happen to be in the neighborhood. That’s all pretty clear.

But I’m nosy; I checked to see if any of those 7,000 or so had ever given the school any money. It turns out that a little over a third of them had given some money. I couldn’t tell how much because my friend had not given me dollar amounts. Just whether or not they’d ever given a hard credit gift to the school. (We’d agreed that actual amounts would distract us from the goals of the project.)

I couldn’t stop there. I started digging around to see what some of the differences were between the ones who’d given and the ones who hadn’t. I think I found some interesting stuff. Of course, that doesn’t mean what I found applies to other schools. Maybe it does. Maybe it doesn’t.

For me, that’s not so important. What’s important is that young folks (like Kevin MacDonell, the creator of this great blog) who carry the vanguard of data-driven decision making in our field will take this topic farther than I have. And that can’t be a bad thing.

In the rest of this piece I’ll cover three topics:

  • Some of the interesting differences between the givers and non-givers (who say “Don’t solicit.”)
  • The model I used to “predict” the givers
  • Some concluding thoughts

Some Interesting Differences

In a moment we can take a look at Figures 1-6 that show some variables on which “Do not solicit” alums differ markedly when it comes to giving. Before we do that, however, I think it’s important to point out something we don’t know. We don’t know when or under what circumstances any of these alums told the school they did not want to be solicited. For example,

  • Did they do it recently or a long time ago?
  • Did they do it on a permanent basis, or just temporarily? It’s possible that some of them are parents of kids attending the school and their attitude for the next four years is, “Hey, I’m paying an arm and a leg for tuition. Until I’m done with that, please don’t go asking me for money while I’m in hock up to my underwear.”
  • Are they bent out of shape over the beloved football coach who was finally let go after ten consecutive losing seasons – something they may eventually get beyond? Or is it simply a case of, “I’ll give you something, maybe a lot, when I’m ready. In the meantime, don’t bug me?”

Again, we don’t know. But before any advancement person uses the results of a predictive model like the one I lay out here, they should consider these sorts of possibilities. More about that later.

Let’s take a walk through the figures. After each one I’ve made a short comment or two to make it clear what the figure is conveying.

There are huge difference in the giving rates among three types of alums. Undergraduate alums are almost twice as likely to give as graduate alums and more than six times as likely to give as non-degreed alums.

Alums who have attended at least one reunion are two and a half times as likely to give as alums who have never attended a reunion.

Alums who have never attended an event are less than half as likely to give as alums who’ve attended one event and about a third as likely to give as alums who have attended two or more events.

Alums who are members of the online community are twice as likely to give as alums who are not members.

Alums who were members of a Greek organization as undergrads are almost twice as likely to give as alums who were not Greek members.

Alums who are children of alums are two and a half times as likely to give as alums whose parents are not alums.

Clearly, these figures (and several I haven’t included) show there are a number of variables in the alumni database at this school that can be used to predict which “do not solicit” alums may be more likely to give in the future. One way to find out if we’re right is to (a) build a model that yields a “likelihood of giving score” for each of these alums and (b) begin testing the model.

Building a predictive model

I have to admit that, here, I was a bit torn about how much technical detail to go into. If I take you minute step by minute step through the model building process, I risk both confusing you and boring you. That wouldn’t be good. On the other hand, if I go too light on the details, you may say, “Come on, Pete, you haven’t given me enough info to see if I can test your results at my own school.”

So … how about this as a compromise?

Using multiple regression (if you know what that is, great; if you don’t, not to worry), I created a score for each alum where EVER GAVE (0/1) was the outcome variable and where the variables you see above as well as the following were the predictor variables:

  • Count of current volunteer activities
  • Count of past volunteer activities
  • Year the alum graduated (or should have if he or she had completed a degree “on time”)

The model generated well over 2,500 different score levels into which the 7,393 alums  could fall – way too many for anybody to get their conceptual arms around. The adjusted R squared for the model was about 36%. (Again, don’t worry if you don’t know what R squared means.)

I divided the 2,695 score levels into 10 groups of “deciles” containing about 740 alums each. As you look at Table 1, you’ll see these groups varied some in size. In Decile 1 (the lowest scoring 10% of alums) there are 726 people. In Decile 10 (the highest scoring 10% of alums) there are 739 people.

TABLE 1: Frequency Breakdown of “Do Not Solicit” Alums by Score Decile

If the model is to be useful in identifying “Do not solicit” alums who are likely to give, the number and percentage of givers should increase as the deciles increase. A look at Table 2 and Fig. 7 show that these numbers and percentages do just that. For example, in the first decile, of the 726 alums, only 12 (1.7%) have ever given anything to the school. In the tenth decile, of the 739 alums, 649 (87.8%) have given to the school.

TABLE 2: Number and Percentage of “Do Not Solicit” Alums Who Gave by Score Decile

Some Concluding Thoughts

You may have read other stuff I’ve written on data mining and predictive modeling. If you have, could be you’re tired of hearing me say that higher education advancement offices ignore most of the data they have on their alums as they go about the business of raising money from those alums.

Well, no rest for the weary here. This little study is a good example of what I’m talking about. Who would have thought there would be such striking differences between givers and non-givers who ask not to be asked? Not me. I just stumbled onto it because I was playing around with data that had been put together for a totally different reason. And I think that’s my point. When we’re talking alumni databases, there are oceans of data that could help us save a lot of money and generate a lot more revenue for some very worthy missions. But with the drops of analysis we’re currently doing on all that data, we’re not saving all that much money nor generating that much more revenue. We’re not. And that needs to change.

If you’re reading this, I suspect I’m preaching to the choir. It’s probably not you that needs the convincing on this matter. It’s probably the big bosses where you work that need the exhortation and cajoling. So the next time one of ‘em makes noises about spending big money on some product or service that’s designed to “prepare for the campaign” or whatever, you might say something like: “That’s cool. That’s great. But let’s not forget about all that data we’ve got just sitting there waiting to help us identify individuals who can play a major role in this project.” If they look intrigued, start pitching them. If they don’t, don’t give up. Take another run at them in a few months.

Back to the specific topic of this paper. Try to build a predictive giving model for your own “Do not solicit” alums. If you’re not proficient with using statistical software, find somebody in your school who is and get them to help you. If you have an Office of Institutional Research (or some similarly titled entity), that’s a good place to look. Just make sure the person you choose grasps the basic idea of what you’re trying to do and has the capacity to explain technical stuff in plain English.

Then do some in depth research on the high scoring alums that emerge from your model, especially those who’ve been generous givers over the years. Share the names with some of your colleagues, whether they’re involved in the annual fund, prospect research, or part of your cadre of gifts officers. My bet is that at least one alum is going to pop out of the mix who is teed up for a real nice appeal if your group comes up with the right strategy.

Good luck and let us know how it turns out.

Blog at