What is the average age of new expectancies, at the time they became known to your organization?
What is the size of your general prospect pool?
I’ve got baseball analytics on my mind. I don’t know if it’s because of the onset of July or because of a recent mention of CoolData on Nate Silver’s FiveThirtyEight blog, but I have been deeply absorbed in an analysis of donor giving behaviours inspired by Silver’s book, “The Signal and the Noise.” It might give you some ideas for things to try with your own database.
Back in 2003, Silver designed a system to predict the performance of Major League Baseball players. The system, called PECOTA, attempts to understand how a player’s performance evolves as he ages. As Silver writes in his book, its forecasts were probabilistic, offering a range of possible outcomes for each player. From the previous work of others, Silver was aware that hitters reach their peak performance at age 27, on average. Page 81 of his book shows the “aging curve” for major league hitters, a parabola starting at age 18, arcing smoothly upwards through the 20s, peaking at 27, and then just as smoothly arcing downwards to age 39.
My immediate thought on reading about this was, what about donors? Can we visualize the trajectory of various types of donors (major donors, bequest donors, leadership annual fund donors) from their first ten bucks right after graduating, then on into their peak earning years? What would we learn by doing that?
In baseball, the aging curve presents a problem for teams acquiring players with proven track records. By the time they become free agents, their peak years will have passed. However, if the early exploits of a young prospect cause him to resemble one of the greats from the past, perhaps he is worth investing in. The curve, as Silver notes, is only an average. Some players will peak far earlier, and some far later, than the average age of 27. There are different types of players, and difference types of curves, and every individual career is different. But lurking in all that noise, there is a signal.
Silver’s PECOTA system takes things further than that — I will return to that later — but now I want to turn to how we can visualize a sort of aging curve for our donors.
What’s the payoff? Well, to cut to the chase: It appears that some donors who go on to give in six figures (lifetime total) can be distinguished from the majority of lower-level donors at a very early age. Above-average giving ($200 or $250, say) in any one year during one’s late 20s or early 30s is a predictor of very high lifetime value. I have found that when big donors have started their giving young, they have started big. That is, “big” in relation to their similarly-aged peers – not at thousands of dollars, but at $100 or $500, depending on how old they were at the time.
Call it “precocious giving”.
Granted, it sounds a bit like plain common sense. But being analytical about it opens up the possibility of identifying a donor with high lifetime value when they’re still in their late 30s or early 40s. You can judge for yourself, but the idea appealed to me.
I’m getting ahead of myself. At the start of this, I was only interested in getting the data prepared and plotting the curve to see what it would look like. To the extent that it resembled a curve at all, I figured that “peak age” — the counterpart to baseball player performance — would be the precise age at which a donor gave the most in any given year while they were alive.
I wrote a query to pull all donors from the database (persons only), with a row for each year of giving, summing on total giving in that year. Along with Year of Gift, I pulled down Year of Birth for each donor — excluding anyone for whom we had no birthdate. I included only amounts given while the donor was living; bequests were excluded.
The next step was to calculate the age the donor was at the time of the gift. I added a column to the data, defined as the Year of Gift minus Year of Birth. That gave me a close-enough figure for age at time of giving.
As I worked on the analysis, I kept going back to the query to add things I needed, such as certain donor attributes that I wanted to examine. Here are most of the variables I ended up pulling from the database for each unique combination of Donor ID and Age at Time of Gift:
The result was a data set with more than 200,000 rows. Notice, of course, that a donor ID can appear on multiple rows — one for each value of Age at Gift. The key thing to remember is that I didn’t care what year giving occurred, I only wanted to know how old someone was when they gave. So in my results, a donor who gave in 1963 when she was 42 is much the same as a donor who gave in 2013 he was the same age.
Now it was time to visualize this data, and for that I used Tableau. I connected directly to the database and loaded the data into Tableau using custom SQL. ‘Age at Gift’ is numerical, so Tableau automatically listed that variable in the Measures panel. For this analysis, I wanted to treat it as a category instead, so I dragged it into the Dimensions panel. (If you’re not familiar with Tableau, don’t worry about these application-specific steps — once you get the general idea, you can replicate this using your tool of choice.)
The first (and easiest) thing to visualize was simply the number of donors at each age. Click on the image below to see a full-size version. Every part of the shape of this curve says something interesting, I think, but the one thing I have annotated here is the age at which the largest number of people chose to make a gift.
This chart lumps in everyone — alumni and non-alumni, living donors and deceased donors — so I wanted to go a little deeper. I would expect to see a difference between alumni and non-alumni, for example, so I put all degree and non-degree alumni into one category (Alumni), and all other donor constituents into another (Non-alumni). The curve does not change dramatically, but we can see that the number of non-alumni donors peaks later than the number of alumni donors.
There are a number of reasons for analyzing alumni and non-alumni separately, so from this point on, I decided to exclude non-alumni.
The fact that 46 seems to be an important age is interesting, but this probably says as much about the age composition of our alumni and our fundraising effort over the years as it does about donor behaviour. To get a sense of how this might be true, I divided all alumni donors into quartiles (four bins containing roughly equal numbers of alumni), by Birth Year. Alumni donors broke down this way:
Clearly these are very different cohorts! The donors in the middle two quartiles were born in a span of only a decade each, while the span of the youngest quartile is 24 years, and the span of the oldest quartile is 71 years! When I charted each age group separately, they split into distinct phases. (Reminder: click on the image for a full-size version.)
This chart highlights a significant problem with visualizing the life cycle of donors: Many of the donors in the data aren’t finished their giving careers yet. When Nate Silver talks about the aging curves of baseball players, he means players whose career is behind them. How else to see their rise, peak, and eventual decline? According to the chart above, the youngest quartile peaks (in terms of number of donors) at age 26. However, most of these donors are still alive and have many years of giving ahead of them. We will turn to them to identify up-and-coming donors, but as long as we are trying to map out what a lifetime of giving looks like, we need to focus on the oldest donors.
An additional problem is that our donor database doesn’t go back as far as baseball stats do. Sure, we’ve got people in the database who were born more than 140 years ago, but our giving records are very sparse for years before the early 1970s. If a donor was very mature at that time, his apparent lack of giving history might cause us to make erroneous observations.
I decided to limit the data set to donors born between 1920 and 1944. This excludes the following donors who are likely to have incomplete giving histories:
This is a bit arbitrary, but reasonable. It trims off the donors who could never have had a chance to have a lifetime of giving recorded in the data, without unduly reducing the size of my data set. I was left with only 20% of my original data, but still, that’s more than 6,000 individuals. I could have gotten fussier with this, removing anyone who died at a relatively young age, but I figured the data was good enough to provide some insights.
The dramatic difference made by this trimming is evident in the following two charts. Both charts show a line for the number of donors by age at time of gift, for each of three lifetime giving levels: Under $1,000 in blue, $1,000 to $10,000 in orange, and over $10,000 in purple. What this means is that all the donors represented by the purple line (for example) gave at least $10,000 cumulatively over the course of their lifetime.
The first chart is based on ALL the data, before trimming according to birth year. The second chart is based on the 6,000 or so records I was left with after trimming. The first chart seems to offer an interesting insight: The higher the lifetime value of donors, the later in life they tend to show up in great numbers. But of course this just isn’t true. Although the number of donors with lower lifetime giving peaks at earlier ages, that’s only because that whole group of donors is younger: They’re not done giving yet. (I have added ‘Median Current Age’ to the high point of each curve to illustrate this.) Remember, this chart includes everyone — it’s the “untrimmed” data:
Contrast that three-phase chart with this next one, based on “trimmed” data. The curves are more aligned, presumably because we are now looking at a better-defined cohort of donors (those born 1920 to 1944). The oldest donor is 24 years older than the youngest donor, but that’s okay: The most important concern is having good data for that range of ages. Because the tops of these curves are flatter, I have annotated more points, for the sake of interest.
These curves are pretty, but they aren’t analogous to “performance curves” for baseball players — we haven’t yet looked at how MUCH donors give, on average, at each age. However, what general observations can we make from the last chart? Some that come to my mind:
Here is where things really get interesting. The whole point of this exercise was to see if we can spot the telltale signs of a future major donor while they are still relatively young, just as a baseball scout looks for young prospects who haven’t peaked yet. Do donors signal unusual generosity even when they are still in their 20s and 30s? Let’s have a look.
I zoomed in on a very small part of the chart, to show giving activity up until age 35. Are there differences between the various levels of donors? You bet there are.
As soon as a high-lifetime-value donor starts to give, the gifts are higher, relative to same-age peers who will end up giving less. The number of donors at these early ages is miniscule, so take this with a grain of salt, but a trend seems unmistakable: Up to the age of 30, donors who will end up giving in five figures and higher give about 2.5 to 3.5 times as much per year as other donors their age who end up giving $1,000 to $10,000 lifetime. AND, they give FIVE TIMES as much per year as other donors their age who end up giving less than $1,000 lifetime.
Later on, at ages 35 and 40, donors who will finish their giving careers at the high end are giving two to three times as much per year as donors in the middle range, and 5.6 to 7 times per year (respectively) as donors who will finish on the lowest end.
It might be less confusing to chart each group of donors by average giving per year, rather than by number of donors. This chart shows average giving per year up until age 65. Naturally, the averages get very spiky, as donors start making large gifts.
To temper the effect of extreme values, I log-transformed the giving amounts. This made it easier to visualize how these three tiers of donors differ from each other over a lifetime of giving:
What do I see from this? These are generalizations based on averages, but potentially useful generalizations:
What’s the bottom line here? I think it’s this: Hundreds of donors were well on their way to being exceptional by the tender age of 40, and a few were signaling long before that.
Information like this would be interesting to Annual Fund as they work to identify prospects for leadership-level giving. But $10,000 in a lifetime is a little too low to make the Major Gifts folks take notice. Can we carve out the really big donors from the $10K-plus crowd? And can we also identify them before they hit 40? Have a look at this chart. For this one, I removed all the donors who gave less than $10,000 lifetime, and then I divided the high-end donors into those who gave less than $100,000 lifetime (green line) and those who gave more than $100,000 (red line).
The lines get a bit jagged, but it looks to me like the six-figure lifetime donors pull away from the five-figure donors while still in their 40s. And notice as well that they increase their giving after age 65, which is very unusual behaviour: By 65, the vast majority of donors have either long plateaued or are starting to wind down. (You can’t see this in the chart, but that post-65 group of very generous donors numbers about 50 individuals, with yearly average giving ranging around $25,000 to $50,000.)
When I drill down, I can see about a hundred donors sitting along the red line between the ages of 30 and 45, whom we might have identified as exceptional, had we known what to look for.
With the benefit of hindsight, we are now able to look at current donors who were born more recently (after 1969, say), and identify who’s sending out early signals. I have those charts, but I think you’ve seen enough, and as I have said many times in the past: My data is not your data. So while I can propose the following “rules” for identifying an up-and-comer, I don’t recommend you try applying them to your own situation without running your own analysis:
Does this mean I think we can ask a 32-year-old for $10,000 this year? No. It means that this 32-year-old is someone to watch out for and to keep engaged as an alum. It’s the donors over 50 or so who have exhibited these telltale patterns in their early giving that might belong in a major gift prospect portfolio.
Precocious giving certainly isn’t the only indicator of a good prospect, but along with a few other unusual traits, it is a good start. (See: Odd but true findings? Upgrading annual donors are “erratic” and “volatile”.)
Where do you go from here? That is completely up to you. I am still in the process of figuring out how to best use these insights.
Coming up with some rules of thumb, as above, is one way to proceed. Another is rolling up all of a donor’s early giving into a single score — a Precocity Score — that takes into account both how much a donor gave, and how young she was when she gave it. I experimented with a formula that gave progressively higher weights to the number of dollars given for younger ages. For example, $100 given at age 26 might be worth several times more than $200 given at age 44.
Using my data set of donors with a full life cycle of giving, I tested whether this score was predictive of lifetime value. It certainly was. However, I also found that a simple cumulative sum of a donor’s giving up to age 35 or 40 was equally as predictive. There didn’t seem to be any additional benefit to giving extra weight to very early giving.
I am shying away from using giving history as an input in a predictive model. I see people do this all the time, but I have always avoided the practice. My preference is to use some version of the rules above as just one more tool to use in prospect identification, distinct from other efforts such as predictive modelling.
That’s as far as I have gotten. If this discussion has given you some ideas to explore, then wonderful. I doubt I’m breaking new ground here, so if you’ve already analyzed giving-by-age, I’d be interested in hearing how you’ve applied what you’ve learned.
Incidentally, Nate Silver went on to produce “similarity scores” for pairs of hitters. Using baseball’s rich trove of data, he compared players using a nearest-neighbour analysis, which took into account a wide range of data points, from player height and weight to all the game stats that baseball is famous for. A young prospect in the minor leagues with a score that indicates a high degree of similarity with a known star might be expected to “age” in a similar way. That was the theory, anyway.
One can imagine how this might translate to the fundraising arena. If you identified groups of your best donors, with a high degree of similarity among the members of each group, you could then identify younger donors with characteristics that are similar to the members of each group. After all, major gift donors are not all alike, so why not try to fit multiple “types”?
I would guess that the relatively small size of each group would cause any signal to get drowned out in the noise. I am a little skeptical that we can parse things that finely. It would, however, be an interesting project.
A final note. The PECOTA system had some successes and for a time was an improvement on existing predictive tools. Over time, however, pure statistics were not a match for the combination of quantitative methods and the experience and knowledge of talent scouts. In the same way, identifying the best prospects for fundraising relies on the combined wisdom of data analysts, researchers and fundraisers themselves.
Figuring out connections is a big part of major gift fundraising. A development office might have a great list of prospective donors in hand, but those prospects will languish untouched if the right people who can make introductions are never found. While cultivating relationships with well-connected people is clearly important, research also plays a role in suggesting possible go-betweens. And to the extent that research involves sifting through data to find connections, it’s possible to automate some of the work.
Wouldn’t it be great to have software that asks you to name a prospective donor, and then spits out a list of people in your database who are most likely to be associates of that person? Sure it would!
Social media applications do a version of this, suggesting “people you may know,” based on shared connections and attributes such as schools attended, professional group memberships, employment information, and stated interests. I first encountered a higher-ed advancement application of this concept via Erich Preisendorfer of the University of New Hampshire. A while back, Erich explained his approach in a guest post for CoolData. (Read it here: Finding connections to your major gift prospects in your data.) I was recently in touch with Erich and he reports that the tool is still in use by prospect researchers at UNH and is still turning up valuable connections.
I tried to build something similar using our reporting tools, but it was a slow-lumbering beast, and not very user-friendly for the people most likely to be interesting in using it — prospect researchers, mainly. I never gave up on the idea of building something better, and finally on a quiet weekend a few months ago, I found two or three hours to mess around with a Python script. Given the ID of a person, this script makes multiple queries of the database and joins the results together to produce a ranked list of other people who might be associated with the input ID.
When the script is run, it prompts the user for an ID, then an Oracle database password. I’ll get into more detail later, but here is what the output looks like in the Python shell of IDLE, the development tool that I used to write the script, when I use my own ID as the one to find connections for. (I have blanked out the ID and password.)
The script goes through these steps:
Here are the top few rows of the output. I have removed some columns for simplicity, and blanked out some personal information. The count_sum column is the “hit count”, and the list is sorted in descending order by count_sum. The person being searched on is always at the top of the list. This is not an especially impressive example, because I am not an alumnus. My list of likely connections is made up of co-workers and people from my hometown who happen to be the same age as me. The woman in the third row of the data was someone I had a crush on in junior high school but haven’t seen in 27 years … If I were a prospective donor, this search would be a bust. Clearly this method requires quality alumni data in order to be useful.
That’s how it functions. We will dig into the actual code shortly. But first, let me say that my primary aim is to share this with you as an idea. I can’t help you with installing Python or debugging or any of that. All I can do is show you my code and hope you can modify it to make it work for you. If you think this is cool and useful but don’t want to learn any code, I suggest you team up with someone from IT or your Advancement Services team, or a student or prof in the Computing Science department.
As I said, I’m using Python and a code library for handling data called pandas. (For some background, see my earlier posts: Getting bitten by Python and Automate RFM scoring of your donors with this Python script.) The great thing about pandas is that you can load data into in-memory structures called DataFrames, which are like data tables in that you can perform joins and sort, rank, de-dupe, add calculated columns, and so on.
A few closing comments before I show you the code:
AND, one final, final comment: This is probably the last piece on Python and database querying I’m going to write for a while. In future posts I will tone down the technical stuff and get back to talking about data and predictive modelling.
# Import required code libraries import pandas as pd import numpy as np import pandas.io.sql as psql import cx_Oracle, os # User prompts, and connect to Oracle database ID_input = raw_input('Enter ID to find connectors for: ') user = raw_input('Enter username for database: ') password = raw_input('Enter password for database: ') connectstr = user + '/' + password + '@database:0000' connection = cx_Oracle.connect(connectstr) # Query to get the database PIDM and name for the entered ID. SQL = ''' SELECT BANINST1.AA_ENTITY.PIDM, BANINST1.AA_ENTITY.ID, BANINST1.AA_ENTITY.FIRST_NAME, BANINST1.AA_ENTITY.MI, BANINST1.AA_ENTITY.LAST_NAME FROM BANINST1.AA_ENTITY WHERE (BANINST1.AA_ENTITY.CHANGE_IND IS NULL AND BANINST1.AA_ENTITY.ID=''' + "'" + target + "'" + ''')''' df = psql.frame_query(SQL, con=connection) name = str(df['FIRST_NAME']+' '+df['LAST_NAME']) print('Searching for connections to: '+ name) target_pidm = max(df['PIDM']) # PIDM, not ID, is the primary key in my database. # Find events SQL = ''' WITH events AS ( SELECT BANINST1.AA_EVENTS.EVNT_CRN FROM BANINST1.AA_EVENTS WHERE BANINST1.AA_EVENTS.PIDM = ''' +str(target_pidm)+ ''') SELECT events.EVNT_CRN, BANINST1.AA_EVENTS.PIDM FROM events, BANINST1.AA_EVENTS WHERE events.EVNT_CRN = AA_EVENTS.EVNT_CRN (+)''' events = psql.frame_query(SQL, con=connection) print("Found " + str(len(events)) + " people who attended the same events.") # Find activities SQL = '''WITH activities AS ( SELECT BANINST1.AA_ACTIVITY_AND_LEADERS.ACTP_CODE FROM BANINST1.AA_ACTIVITY_AND_LEADERS WHERE BANINST1.AA_ACTIVITY_AND_LEADERS.PIDM = '''+str(target_pidm)+''') SELECT activities.ACTP_CODE, BANINST1.AA_ACTIVITY_AND_LEADERS.PIDM FROM activities, BANINST1.AA_ACTIVITY_AND_LEADERS WHERE activities.ACTP_CODE = BANINST1.AA_ACTIVITY_AND_LEADERS.ACTP_CODE (+) ''' activities = psql.frame_query(SQL, con=connection) print("Found " + str(len(activities)) + " people with the same Activity codes.") # Find employers SQL = '''WITH employers AS ( SELECT BANINST1.AA_ALL_EMPLOYMENT.EMPR_NAME FROM BANINST1.AA_ALL_EMPLOYMENT WHERE BANINST1.AA_ALL_EMPLOYMENT.PIDM = '''+str(target_pidm)+''') SELECT employers.EMPR_NAME, BANINST1.AA_ALL_EMPLOYMENT.PIDM FROM employers, BANINST1.AA_ALL_EMPLOYMENT WHERE employers.EMPR_NAME = BANINST1.AA_ALL_EMPLOYMENT.EMPR_NAME (+)''' employers = psql.frame_query(SQL, con=connection) print("Found " + str(len(employers)) + " people with the same Employers.") # Find class years SQL = '''WITH classes AS ( SELECT BANINST1.AA_DEGREE.ACYR_CODE FROM BANINST1.AA_DEGREE WHERE BANINST1.AA_DEGREE.PIDM = '''+str(target_pidm)+''' ) SELECT classes.ACYR_CODE, BANINST1.AA_DEGREE.PIDM FROM classes, BANINST1.AA_DEGREE WHERE classes.ACYR_CODE = BANINST1.AA_DEGREE.ACYR_CODE (+)''' classes = psql.frame_query(SQL, con=connection) print("Found " + str(len(classes)) + " people with the same class year(s).") # Find degrees SQL = ''' WITH degrees AS ( SELECT BANINST1.AA_DEGREE.DEGC_CODE FROM BANINST1.AA_DEGREE WHERE BANINST1.AA_DEGREE.PIDM = '''+str(target_pidm)+''' ) SELECT degrees.DEGC_CODE, BANINST1.AA_DEGREE.PIDM FROM degrees, BANINST1.AA_DEGREE WHERE degrees.DEGC_CODE = BANINST1.AA_DEGREE.DEGC_CODE (+)''' degrees = psql.frame_query(SQL, con=connection) print("Found " + str(len(degrees)) + " people with the same degree(s).") # Find high school SQL = ''' WITH high_school AS ( SELECT BANINST1.AA_HIGH_SCHOOL.HIGH_SCHOOL_CODE FROM BANINST1.AA_HIGH_SCHOOL WHERE BANINST1.AA_HIGH_SCHOOL.PIDM = '''+str(target_pidm)+''' ) SELECT high_school.HIGH_SCHOOL_CODE, BANINST1.AA_HIGH_SCHOOL.PIDM FROM high_school, BANINST1.AA_HIGH_SCHOOL WHERE high_school.HIGH_SCHOOL_CODE = BANINST1.AA_HIGH_SCHOOL.HIGH_SCHOOL_CODE (+)''' hs = psql.frame_query(SQL, con=connection) print("Found " + str(len(hs)) + " people from the same high school.") # Find cities SQL = ''' WITH cities AS ( SELECT SATURN.SPRADDR.SPRADDR_CITY FROM SATURN.SPRADDR WHERE SATURN.SPRADDR.SPRADDR_PIDM = '''+str(target_pidm)+''' ) SELECT DISTINCT cities.SPRADDR_CITY, SATURN.SPRADDR.SPRADDR_PIDM AS PIDM FROM cities, SATURN.SPRADDR WHERE cities.SPRADDR_CITY = SATURN.SPRADDR.SPRADDR_CITY (+)''' cities = psql.frame_query(SQL, con=connection) print("Found " + str(len(cities)) + " people with addresses in the same city/cities.") # Find age SQL = ''' WITH age AS ( SELECT BANINST1.AA_ENTITY.AGE FROM BANINST1.AA_ENTITY WHERE BANINST1.AA_ENTITY.PIDM = '''+str(target_pidm)+''' ) SELECT age.AGE, BANINST1.AA_ENTITY.PIDM FROM age, BANINST1.AA_ENTITY WHERE age.AGE = BANINST1.AA_ENTITY.AGE (+)''' age = psql.frame_query(SQL, con=connection) print("Found " + str(len(age)) + " people of the same age.") # Find cross-references SQL = ''' WITH xref AS ( SELECT ALUMNI.APRXREF.APRXREF_XREF_PIDM FROM ALUMNI.APRXREF WHERE ALUMNI.APRXREF.APRXREF_PIDM = '''+str(target_pidm)+''' ) SELECT ALUMNI.APRXREF.APRXREF_XREF_PIDM, ALUMNI.APRXREF.APRXREF_PIDM AS PIDM FROM xref, ALUMNI.APRXREF WHERE xref.APRXREF_XREF_PIDM = ALUMNI.APRXREF.APRXREF_XREF_PIDM (+)''' xref = psql.frame_query(SQL, con=connection) print("Found " + str(len(xref)) + " people who share the same cross-references.") # Concatenate all results print("Joining results.") results = pd.concat([events, activities, employers, classes, degrees, hs, cities, age, xref]) # Remove unneeded columns results = results.drop(['ACTP_CODE', 'ACYR_CODE', 'AGE', 'DEGC_CODE', 'EMPR_NAME', 'EVNT_CRN', 'HIGH_SCHOOL_CODE', 'SPRADDR_CITY', 'APRXREF_XREF_PIDM'], axis=1) print("Found " + str(len(results)) + " total results.") ## Count up the number of times each PIDM appears in the results. (A few steps here.) # Add a column called ‘count’ and populate every row with a ‘1’ results['count'] = 1 # Group by PIDM and sum on ‘count’ to get ‘hits per PIDM’. results = results.join(results.groupby(results['PIDM'])['count'].sum(), on=['PIDM'], rsuffix='_sum') # De-dupe rows in order to get one result per PIDM. results = results.drop_duplicates() # Sort results in descending order by the number of hits. results.sort(columns='count_sum', axis=0, ascending=False, inplace=True) # The file is potentially huge, so prompt the user to limit it by hit count. cutoff = raw_input('How many hits would you like to limit by? ') # Keep rows with minimum number of hits. results = results[results['count_sum'] >= int(cutoff)] # Get rid of the ‘count’ column - not needed. results = results.drop(['count'], axis=1) print("Found " + str(len(results)) + " unique results, with at least "+cutoff+" hits.") ## Join names to results print('Preparing results and saving.') SQL = ''' SELECT BANINST1.AA_ENTITY.PIDM, BANINST1.AA_ENTITY.ID, BANINST1.AA_ENTITY.NAME_PREFIX, BANINST1.AA_ENTITY.FIRST_NAME, BANINST1.AA_ENTITY.MI, BANINST1.AA_ENTITY.LAST_NAME, BANINST1.AA_ENTITY.NAME_SUFFIX, BANINST1.AA_ENTITY.AGE, BANINST1.AA_ENTITY.DEAD_IND, BANINST1.AA_DONOR_TYPE_HIGHEST.DONR_CODE FROM BANINST1.AA_ENTITY, BANINST1.AA_DONOR_TYPE_HIGHEST WHERE BANINST1.AA_ENTITY.PIDM = BANINST1.AA_DONOR_TYPE_HIGHEST.PIDM (+) AND BANINST1.AA_ENTITY.CHANGE_IND IS NULL ''' results = pd.merge(results, psql.frame_query(SQL, con=connection), how='left', on=['PIDM'], sort=False) results.to_csv(target + ' - ' + name + '.csv') print('All done!') connection.close()
Thanks to data available via the 2010 US Census, for any educational institution that provides us zip codes for the alums in its advancement database, we can compute such things as the median income and the median house value of the zip code in which the alum lives.
Now, we tend to focus on internal data rather than external data. For a very long time the two of us have been harping on something that may be getting a bit tiresome: the overemphasis on finding outside wealth data in major giving, and the underemphasis on looking at internal data. Our problem has been that we’ve never had a solid way to systematically compare these two sources of data as they relate to the prediction of giving in higher education.
John Sammis has done a yeoman’s job of finding a very reasonably priced source for this Census data as well as building some add-ons to our statistical software package – add-ons that allow us to manipulate the data in interesting ways. All this has happened within the last six months or so, and I’ve been having a ball playing around with this data, getting John’s opinions on what I’ve done, and then playing with the data some more.
The data for this piece come from four private, small to medium sized higher education institutions in the eastern half of the United States. We’ll show you a smidgeon of some of the things we’ve uncovered. We hope you’ll find it interesting, and we hope you’ll decide to do some playing of your own.
Download the full, printer-friendly PDF of our study here (free, no registration required): Census ZIP data Wylie & Sammis.
No, this is not the last time I’ll write about Phonathon, but after today I promise to give it a rest and talk about something else. I just wanted to round out my post on the waste I see happening in donor acquisition via phone programs with some recent findings of mine. Your mileage may vary, or “YMMV” as they say on the listservs, so as usual don’t just accept what I say. I suggest questions that you might ask of your own data — nothing more.
I’ve been doing a thorough analysis of our acquisition efforts this past year. (The technical term for this is a WTHH analysis … as in “What The Heck Happened??”) I found that getting high phone contact rates seemed to be linked with making a sufficient number of call attempts per prospect. For us, any fewer than three attempts per prospect is too few to acquire new donors in any great number. In general, contact rates improve with call attempt numbers above three, and after that, the more the better.
“Whoa!”, I hear you protest. “Didn’t you just say in your first post that it makes no sense to have a set number of call attempts for all prospects?”
You’re right — I did. It doesn’t make sense to have a limit. But it might make sense to have a minimum.
To get anything from an acquisition segment, more calling is better. However, by “call more” I don’t mean call more people. I mean make more calls per prospect. The RIGHT prospects. Call the right people, and eventually many or most of them will pick up the phone. Call the wrong people, and you can ring them up 20, 30, 50 times and you won’t make a dent. That’s why I think there’s no reason to set a maximum number of call attempts. If you’re calling the right people, then just keep calling.
What’s new here is that three attempts looks like a solid minimum. This is higher than what I see some people reporting on the listservs, and well beyond the capacity of many programs as they are currently run — the ones that call every single person with a phone number in the database. To attain the required amount of per-prospect effort, those schools would have to increase phone capacity (more students, more nights), or load fewer prospects. The latter option is the only one that makes sense.
Reducing the number of people we’re trying to reach to acquire as new donors means using a predictive model or at least some basic data mining and scoring to figure out who is most likely to pick up the phone. I’ve built models that do that for two years now, and after evaluating their performance I can say that they work okay. Not super fantastic, but okay. I can live with okay … in the past five years our program has made close to one million call attempts. Even a marginal improvement in focus at that scale of activity makes a significant difference.
You don’t need to hack your acquisition segment in half today. I’m not saying that. To get new donors you still need lots and lots of prospects. Maybe someday you’ll be calling only a fraction of the people you once did, but there’s no reason you can’t take a gradual approach to getting more focused in the meantime. Trim things down a bit in the first year, evaluate the results, and fold what you learned into trimming a bit more the next year.
(Click here to download post as a print-friendly PDF: Making a Case for Modeling – Wylie Sammis)
Before you wade too far into this piece, let’s be sure we’re talking to the right person. Here are some assumptions we’re making about you:
If we’ve made some accurate assumptions here, great. If we haven’t, we’d still like you to keep reading. But if you want to slip out the back of the seminar room, not to worry. We’ve done it ourselves more times than you can count.
Okay, here’s something you can try:
1. Divide the alums at your school into ten roughly equal size groups (deciles) by class year. Table 1 is an example from a medium sized four year college.
Table 1: Class Years and Counts for Ten Roughly Equal Size Groups (Deciles) of Alumni at School A
2. Create a very simple score:
EMAIL LISTED(1/0) + HOME PHONE LISTED(1/0)
This score can assume three values: “0, “1”, or “2.” A “0” means the alum has neither an email nor a home phone listed in the database. A “1” means the alum has either an email listed in the database or a home phone listed in the database, but not both. A “2” means the alum has both an email and a home phone listed in the database.
3. Create a table that contains the percentage of alums who have contributed at least $1,000 lifetime to your school for each score level for each class year decile. Table 1 is an example of such a table for School A.
Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A
4. Create a three dimensional chart that conveys the same information contained in the table. Figure 1 is an example of such a chart for School A.
In the rest of this piece we’ll be showing tables and charts from seven other very diverse schools that look quite similar to the ones you’ve just seen. At the end, we’ll step back and talk about the importance of what emerges from these charts. We’ll also offer advice on how to explain your own tables and charts to colleagues and bosses.
If you think the above table and chart are clear, go ahead and start browsing through what we’ve laid out for the other seven schools. However, if you’re not completely sure you understand the table and the chart, see if the following hypothetical questions and answers help:
Question: “Okay, I’m looking at Table 2 where it shows 53% for alums in Decile 1 who have a score of 2. Could you just clarify what that means?”
Answer. “That means that 53% of the oldest alums at the school who have both a home phone and an email listed in the database have given at least $1,000 lifetime to the school.”
Question. “Then … that means if I look to the far left in that same row where it shows 29% … that means that 29% of the oldest alums at the school who have neither a home phone nor an email listed in the database have given at least $1,000 lifetime to the school?”
Question. “So those older alums who have a score of 2 are way better givers than those older alums who have a score of 0?”
Answer. “That’s how we see it.”
Question. “I notice that in the younger deciles, regardless of the score, there are a lot of 0 percentages or very low percentages. What’s going on there?”
Answer. “Two things. One, most younger alums don’t have the wherewithal to make big gifts. They need years, sometimes many years, to get their financial legs under them. The second thing? Over the last seven years or so, we’ve looked at the lifetime giving rates of hundreds and hundreds of four-year higher education institutions. The news is not good. In many of them, well over half of the solicitable alums have never given their alma maters a penny.”
Question. “So, maybe for my school, it might be good to lower that giving amount to something like ‘has given at least $500 lifetime’ rather than $1,000 lifetime?”
Answer. Absolutely. There’s nothing sacrosanct about the thousand dollar level that we chose for this piece. You can certainly lower the amount, but you can also raise the amount. In fact, if you told us you were going to try several different amounts, we’d say, “Fantastic!”
Okay, let’s go ahead and have you browse through the rest of the tables and charts for the seven schools we mentioned earlier. Then you can compare your thoughts on what you’ve seen with what we think is going on here.
(Note: After looking at a few of the tables and charts, you may find yourself saying, “Okay, guys. Think I got the idea here.” If so, go ahead and fast forward to our comments.)
Table 3: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School B
Table 4: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School C
Table 5: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School D
Table 6: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School E
Table 7: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School F
Table 8: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School G
Table 9: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School H
Definitely a lot of tables and charts. Here’s what we see in them:
Now we’d like to deal with an often advanced argument against what you see here. It’s not at all uncommon for us to hear skeptics say: “Well, of course alumni on whom we have more personal information are going to be better givers. In fact we often get that information when they make a gift. You could even say that amount of giving and amount of personal information are pretty much the same thing.”
We disagree for at least two reasons:
Amount of personal information and giving in any alumni database are never the same thing. If you have doubts about our assertion, the best way to dispel those doubts is to look in your own alumni database. Create the same simple score we have for this piece. Then look at the percentage of alums for each of the three levels of the score. You will find plenty of alums who have a score of 0 who have given you something, and you will find plenty of alums with a score of 2 who have given you nothing at all.
We have yet to encounter a school where the IT folks can definitively say how an email address or a home phone number got into the database for every alum. Why is that the case? Because email addresses and home phone numbers find their way into alumni database in a variety of ways. Yes, sometimes they are provided by the alum when he or she makes a gift. But there are other ways. To name a few:
Now here’s the kicker. Your reactions to everything you’ve seen in this piece are critical. If you’re going to go to a skeptical boss to try to make a case for scouring your alumni database for new candidates for major giving, we think you need to have several reactions to what we’ve laid out here:
1. “WOW!” Not, “Oh, that’s interesting.” It’s gotta be, “WOW!” Trust us on this one.
2. You have to be champing at the bit to create the same kinds of tables and charts that you’ve seen here for your own data.
3. You have to look at Table 2 (that we’ve recreated below) and imagine it represents your own data.
Table 2: Percentage of Alumni at Each Simple Score Level at Each Class Year Decile Who Have Contributed at Least $1,000 Lifetime to School A
Then you have to start saying things like:
“Okay, I’m looking at the third class year decile. These are alums who graduated between 1977 and 1983. Twenty-five percent of them with a score of 2 have given us at least $1,000 lifetime. But what about the 75% who haven’t yet reached that level? Aren’t they going to be much better bets for bigger giving than the 94% of those with a score of 0 who haven’t yet reached the $1,000 level?”
“A score that goes from 0 to 2? Really? What about a much more sophisticated score that’s based on lots more information than just email listed and home phone listed? Wouldn’t it make sense to build a score like that and look at the giving levels for that more sophisticated score across the class year deciles?”
If your reactions have been similar to the ones we’ve just presented, you’re probably getting very close to trying to making your case to the higher-ups. Of course, how you make that case will depend on who you’ll be talking to, who you are, and situational factors that you’re aware of and we’re not. But here are a few general suggestions:
Your first step should be making up the charts and figures for your own data. Maybe you have the skills to do this on your own. If not, find a technical person to do it for you. In addition to having the right skills, this person should think doing it would be cool and won’t take forever to finish it.
Choose the right person to show our stuff and your stuff to. More and more we’re hearing people in advancement say, “We just got a new VP who really believes in analytics. We think she may be really receptive to this kind of approach.” Obviously, that’s the kind of person you want to approach. If you have a stodgy boss in between you and that VP, find a way around your boss. There’s lots of ways to do that.
Do what mystery writers do; use the weapon of surprise. Whoever the boss you go to is, we’d recommend that you show them this piece first. After you know they’ve read it, ask them what they thought of it. If they say anything remotely similar to: “I wonder what our data looks like,” you say, “Funny you should ask.”
Whatever your reactions to this piece have been, we’d love to hear them.