CoolData blog

7 July 2014

Mine your donor data with this baseball-inspired analysis

I’ve got baseball analytics on my mind. I don’t know if it’s because of the onset of July or because of a recent mention of CoolData on Nate Silver’s FiveThirtyEight blog, but I have been deeply absorbed in an analysis of donor giving behaviours inspired by Silver’s book, “The Signal and the Noise.” It might give you some ideas for things to try with your own database.

Back in 2003, Silver designed a system to predict the performance of Major League Baseball players. The system, called PECOTA, attempts to understand how a player’s performance evolves as he ages. As Silver writes in his book, its forecasts were probabilistic, offering a range of possible outcomes for each player. From the previous work of others, Silver was aware that hitters reach their peak performance at age 27, on average. Page 81 of his book shows the “aging curve” for major league hitters, a parabola starting at age 18, arcing smoothly upwards through the 20s, peaking at 27, and then just as smoothly arcing downwards to age 39.

My immediate thought on reading about this was, what about donors? Can we visualize the trajectory of various types of donors (major donors, bequest donors, leadership annual fund donors) from their first ten bucks right after graduating, then on into their peak earning years? What would we learn by doing that?

In baseball, the aging curve presents a problem for teams acquiring players with proven track records. By the time they become free agents, their peak years will have passed. However, if the early exploits of a young prospect cause him to resemble one of the greats from the past, perhaps he is worth investing in. The curve, as Silver notes, is only an average. Some players will peak far earlier, and some far later, than the average age of 27. There are different types of players, and difference types of curves, and every individual career is different. But lurking in all that noise, there is a signal.

Silver’s PECOTA system takes things further than that — I will return to that later — but now I want to turn to how we can visualize a sort of aging curve for our donors.

What’s the payoff? Well, to cut to the chase: It appears that some donors who go on to give in six figures (lifetime total) can be distinguished from the majority of lower-level donors at a very early age. Above-average giving ($200 or $250, say) in any one year during one’s late 20s or early 30s is a predictor of very high lifetime value. I have found that when big donors have started their giving young, they have started big. That is, “big” in relation to their similarly-aged peers – not at thousands of dollars, but at $100 or $500, depending on how old they were at the time.

Call it “precocious giving”.

Granted, it sounds a bit like plain common sense. But being analytical about it opens up the possibility of identifying a donor with high lifetime value when they’re still in their late 30s or early 40s. You can judge for yourself, but the idea appealed to me.

I’m getting ahead of myself. At the start of this, I was only interested in getting the data prepared and plotting the curve to see what it would look like. To the extent that it resembled a curve at all, I figured that “peak age” — the counterpart to baseball player performance — would be the precise age at which a donor gave the most in any given year while they were alive.

~~~~~~~

I wrote a query to pull all donors from the database (persons only), with a row for each year of giving, summing on total giving in that year. Along with Year of Gift, I pulled down Year of Birth for each donor — excluding anyone for whom we had no birthdate. I included only amounts given while the donor was living; bequests were excluded.

The next step was to calculate the age the donor was at the time of the gift. I added a column to the data, defined as the Year of Gift minus Year of Birth. That gave me a close-enough figure for age at time of giving.

As I worked on the analysis, I kept going back to the query to add things I needed, such as certain donor attributes that I wanted to examine. Here are most of the variables I ended up pulling from the database for each unique combination of Donor ID and Age at Time of Gift:

  • ID
  • Age at Time of Gift (Year of Gift minus Year of Birth)
  • Sum of Giving (total giving for that donor, at that age)
  • Donor Category Code (Alum, Friend, etc.)
  • Total Lifetime Giving (for each donor, without regard to age)
  • Deceased Indicator (are they living or dead as of today)
  • Current Age (if living, how old are they right now)
  • Year of Birth
  • Year of Gift

The result was a data set with more than 200,000 rows. Notice, of course, that a donor ID can appear on multiple rows — one for each value of Age at Gift. The key thing to remember is that I didn’t care what year giving occurred, I only wanted to know how old someone was when they gave. So in my results, a donor who gave in 1963 when she was 42 is much the same as a donor who gave in 2013 he was the same age.

pecota1

Now it was time to visualize this data, and for that I used Tableau. I connected directly to the database and loaded the data into Tableau using custom SQL. ‘Age at Gift’ is numerical, so Tableau automatically listed that variable in the Measures panel. For this analysis, I wanted to treat it as a category instead, so I dragged it into the Dimensions panel. (If you’re not familiar with Tableau, don’t worry about these application-specific steps — once you get the general idea, you can replicate this using your tool of choice.)

The first (and easiest) thing to visualize was simply the number of donors at each age. Click on the image below to see a full-size version. Every part of the shape of this curve says something interesting, I think, but the one thing I have annotated here is the age at which the largest number of people chose to make a gift.

pecota2

This chart lumps in everyone — alumni and non-alumni, living donors and deceased donors — so I wanted to go a little deeper. I would expect to see a difference between alumni and non-alumni, for example, so I put all degree and non-degree alumni into one category (Alumni), and all other donor constituents into another (Non-alumni). The curve does not change dramatically, but we can see that the number of non-alumni donors peaks later than the number of alumni donors.

pecota3

There are a number of reasons for analyzing alumni and non-alumni separately, so from this point on, I decided to exclude non-alumni.

The fact that 46 seems to be an important age is interesting, but this probably says as much about the age composition of our alumni and our fundraising effort over the years as it does about donor behaviour. To get a sense of how this might be true, I divided all alumni donors into quartiles (four bins containing roughly equal numbers of alumni), by Birth Year. Alumni donors broke down this way:

  1. Born 1873 to 1944: 8,101 donors
  2. Born 1945 to 1955: 8,036 donors
  3. Born 1956 to 1966: 8,614 donors
  4. Born 1967 to 1991: 8,172 donors

Clearly these are very different cohorts! The donors in the middle two quartiles were born in a span of only a decade each, while the span of the youngest quartile is 24 years, and the span of the oldest quartile is 71 years! When I charted each age group separately, they split into distinct phases. (Reminder: click on the image for a full-size version.)

pecota4

This chart highlights a significant problem with visualizing the life cycle of donors: Many of the donors in the data aren’t finished their giving careers yet. When Nate Silver talks about the aging curves of baseball players, he means players whose career is behind them. How else to see their rise, peak, and eventual decline? According to the chart above, the youngest quartile peaks (in terms of number of donors) at age 26. However, most of these donors are still alive and have many years of giving ahead of them. We will turn to them to identify up-and-coming donors, but as long as we are trying to map out what a lifetime of giving looks like, we need to focus on the oldest donors.

An additional problem is that our donor database doesn’t go back as far as baseball stats do. Sure, we’ve got people in the database who were born more than 140 years ago, but our giving records are very sparse for years before the early 1970s. If a donor was very mature at that time, his apparent lack of giving history might cause us to make erroneous observations.

I decided to limit the data set to donors born between 1920 and 1944. This excludes the following donors who are likely to have incomplete giving histories:

  • Anyone who was older than 50 in 1970, when giving records really started to get tracked, and
  • Anyone who is currently younger than 70, and may have many years of giving left.

This is a bit arbitrary, but reasonable. It trims off the donors who could never have had a chance to have a lifetime of giving recorded in the data, without unduly reducing the size of my data set. I was left with only 20% of my original data, but still, that’s more than 6,000 individuals. I could have gotten fussier with this, removing anyone who died at a relatively young age, but I figured the data was good enough to provide some insights.

The dramatic difference made by this trimming is evident in the following two charts. Both charts show a line for the number of donors by age at time of gift, for each of three lifetime giving levels: Under $1,000 in blue, $1,000 to $10,000 in orange, and over $10,000 in purple. What this means is that all the donors represented by the purple line (for example) gave at least $10,000 cumulatively over the course of their lifetime.

The first chart is based on ALL the data, before trimming according to birth year. The second chart is based on the 6,000 or so records I was left with after trimming. The first chart seems to offer an interesting insight: The higher the lifetime value of donors, the later in life they tend to show up in great numbers. But of course this just isn’t true. Although the number of donors with lower lifetime giving peaks at earlier ages, that’s only because that whole group of donors is younger: They’re not done giving yet. (I have added ‘Median Current Age’ to the high point of each curve to illustrate this.) Remember, this chart includes everyone — it’s the “untrimmed” data:

pecota5
Contrast that three-phase chart with this next one, based on “trimmed” data. The curves are more aligned, presumably because we are now looking at a better-defined cohort of donors (those born 1920 to 1944). The oldest donor is 24 years older than the youngest donor, but that’s okay: The most important concern is having good data for that range of ages. Because the tops of these curves are flatter, I have annotated more points, for the sake of interest.

pecota6

These curves are pretty, but they aren’t analogous to “performance curves” for baseball players — we haven’t yet looked at how MUCH donors give, on average, at each age. However, what general observations can we make from the last chart? Some that come to my mind:

  • Regardless of what a donor finally ends up giving lifetime, there are always a few (a very few) who start giving while they are in their 20s, and a few who are still around to give when they are in their late 80s and early 90s.
  • The number of donors starts to really take off at around age 40, and there is steady growth until about age 50, when the growth in number of donors begins to slow or plateau.
  • Donors start to drop out rapidly at around age 70. This is due to mortality of course, but probably the steepness of the drop is exaggerated by my trimming of the data at the older end.

~~~~~~~

Here is where things really get interesting. The whole point of this exercise was to see if we can spot the telltale signs of a future major donor while they are still relatively young, just as a baseball scout looks for young prospects who haven’t peaked yet. Do donors signal unusual generosity even when they are still in their 20s and 30s? Let’s have a look.

I zoomed in on a very small part of the chart, to show giving activity up until age 35. Are there differences between the various levels of donors? You bet there are.

As soon as a high-lifetime-value donor starts to give, the gifts are higher, relative to same-age peers who will end up giving less. The number of donors at these early ages is miniscule, so take this with a grain of salt, but a trend seems unmistakable: Up to the age of 30, donors who will end up giving in five figures and higher give about 2.5 to 3.5 times as much per year as other donors their age who end up giving $1,000 to $10,000 lifetime. AND, they give FIVE TIMES as much per year as other donors their age who end up giving less than $1,000 lifetime.

pecota7

Later on, at ages 35 and 40, donors who will finish their giving careers at the high end are giving two to three times as much per year as donors in the middle range, and 5.6 to 7 times per year (respectively) as donors who will finish on the lowest end.

It might be less confusing to chart each group of donors by average giving per year, rather than by number of donors. This chart shows average giving per year up until age 65. Naturally, the averages get very spiky, as donors start making large gifts.

 

pecota8

To temper the effect of extreme values, I log-transformed the giving amounts. This made it easier to visualize how these three tiers of donors differ from each other over a lifetime of giving:

pecota9

What do I see from this? These are generalizations based on averages, but potentially useful generalizations:

  • Upper-end donors start strong relative to other donors, accelerate giving after age 40, and continue to increase giving throughout their lifetimes.
  • Middle- and low-range donors start lower. They also increase their yearly giving until their late 40s, but after that, they plateau and stay at the same level for the rest of their lives.

~~~~~~~

What’s the bottom line here? I think it’s this: Hundreds of donors were well on their way to being exceptional by the tender age of 40, and a few were signaling long before that.

Information like this would be interesting to Annual Fund as they work to identify prospects for leadership-level giving. But $10,000 in a lifetime is a little too low to make the Major Gifts folks take notice. Can we carve out the really big donors from the $10K-plus crowd? And can we also identify them before they hit 40? Have a look at this chart. For this one, I removed all the donors who gave less than $10,000 lifetime, and then I divided the high-end donors into those who gave less than $100,000 lifetime (green line) and those who gave more than $100,000 (red line).

pecota10

 

The lines get a bit jagged, but it looks to me like the six-figure lifetime donors pull away from the five-figure donors while still in their 40s. And notice as well that they increase their giving after age 65, which is very unusual behaviour: By 65, the vast majority of donors have either long plateaued or are starting to wind down. (You can’t see this in the chart, but that post-65 group of very generous donors numbers about 50 individuals, with yearly average giving ranging around $25,000 to $50,000.)

When I drill down, I can see about a hundred donors sitting along the red line between the ages of 30 and 45, whom we might have identified as exceptional, had we known what to look for.

With the benefit of hindsight, we are now able to look at current donors who were born more recently (after 1969, say), and identify who’s sending out early signals. I have those charts, but I think you’ve seen enough, and as I have said many times in the past: My data is not your data. So while I can propose the following “rules” for identifying an up-and-comer, I don’t recommend you try applying them to your own situation without running your own analysis:

  • Gave more than $200 in one year, starting around age 28.
  • Gave more than $250 in one year, starting around age 29.
  • Gave more than $500 in one year, starting around age 32.

Does this mean I think we can ask a 32-year-old for $10,000 this year? No. It means that this 32-year-old is someone to watch out for and to keep engaged as an alum. It’s the donors over 50 or so who have exhibited these telltale patterns in their early giving that might belong in a major gift prospect portfolio.

Precocious giving certainly isn’t the only indicator of a good prospect, but along with a few other unusual traits, it is a good start. (See: Odd but true findings? Upgrading annual donors are “erratic” and “volatile”.)

~~~~~~~

Where do you go from here? That is completely up to you. I am still in the process of figuring out how to best use these insights.

Coming up with some rules of thumb, as above, is one way to proceed. Another is rolling up all of a donor’s early giving into a single score — a Precocity Score — that takes into account both how much a donor gave, and how young she was when she gave it. I experimented with a formula that gave progressively higher weights to the number of dollars given for younger ages. For example, $100 given at age 26 might be worth several times more than $200 given at age 44.

Using my data set of donors with a full life cycle of giving, I tested whether this score was predictive of lifetime value. It certainly was. However, I also found that a simple cumulative sum of a donor’s giving up to age 35 or 40 was equally as predictive. There didn’t seem to be any additional benefit to giving extra weight to very early giving.

I am shying away from using giving history as an input in a predictive model. I see people do this all the time, but I have always avoided the practice. My preference is to use some version of the rules above as just one more tool to use in prospect identification, distinct from other efforts such as predictive modelling.

~~~~~~~

That’s as far as I have gotten. If this discussion has given you some ideas to explore, then wonderful. I doubt I’m breaking new ground here, so if you’ve already analyzed giving-by-age, I’d be interested in hearing how you’ve applied what you’ve learned.

Incidentally, Nate Silver went on to produce “similarity scores” for pairs of hitters. Using baseball’s rich trove of data, he compared players using a nearest-neighbour analysis, which took into account a wide range of data points, from player height and weight to all the game stats that baseball is famous for. A young prospect in the minor leagues with a score that indicates a high degree of similarity with a known star might be expected to “age” in a similar way. That was the theory, anyway.

One can imagine how this might translate to the fundraising arena. If you identified groups of your best donors, with a high degree of similarity among the members of each group, you could then identify younger donors with characteristics that are similar to the members of each group. After all, major gift donors are not all alike, so why not try to fit multiple “types”?

I would guess that the relatively small size of each group would cause any signal to get drowned out in the noise. I am a little skeptical that we can parse things that finely. It would, however, be an interesting project.

A final note. The PECOTA system had some successes and for a time was an improvement on existing predictive tools. Over time, however, pure statistics were not a match for the combination of quantitative methods and the experience and knowledge of talent scouts. In the same way, identifying the best prospects for fundraising relies on the combined wisdom of data analysts, researchers and fundraisers themselves.

Advertisement

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.

map_alums

The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:

manhattan

If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

8 September 2013

Blogging from the Tableau Customer Conference: Why you should care about BI

Filed under: Tableau — Tags: , — kevinmacdonell @ 6:56 pm

I am in Washington DC this week for the Tableau Customer Conference. I’ve mentioned Tableau on CoolData before, and I am a fan of not only the software but the ethos behind the software. Tableau is not a predictive analysis tool, so I don’t write about it much.* But there is a deep and important connection between predictive analytics and business intelligence tools such as Tableau (and Advizor, Spotfire, QlikView …). It’s a connection that has taken me a long time to fully appreciate.

Tableau is a great tool for visualizing your data, and an amazing tool for putting a certain level of analysis into the hands of business users. It can play a role in analysis, for sure, but not statistical analysis or modelling.* So what’s the connection? Well, there’s a connection on two levels.

The first I grasped immediately: Blending predictive model scores with actual results (fundraising results, phone contact results, event attendance results, and so on), for continuous, real-time reporting on model performance post-deployment. End-users wouldn’t get much out of these reports, but I certainly do. (See: Evaluate models with fresh data using Tableau heat maps.)

The deeper connection, the one that has taken me longer to realize, goes like this …

Analytical talent in our sector is as likely, or more likely, to be fostered from within than hired from without. As I see it, the predictive analysts of the future are currently wasting their talents, toiling away at extracting data and reports for end users, often employing Excel in repetitive and error-prone ways. Getting to the point of providing real insights based on data is only a once-in-a-while thing so long as employees are having to spend so much time generating the most basic of reports.

Better tools have arrived, and Tableau is one of them. Let’s start freeing up the creativity and ingenuity of our own employees in the higher education and nonprofit fields.

(BTW, any CoolData readers attending the conference can email me at kevin.macdonell@gmail.com. I would love to learn how your institution is using Tableau.)

* Late-breaking update from the conference: Today (September 9) Tableau announced a range of new features and functionality for all its products in versions 8.1 and 8.2, including integration with the powerful, open-source statistical package R. So much for Tableau not serving up statistics and modelling! New viz options such as box-and-whisker plots will as well add some functionality more associated with stats software.

12 July 2012

Evaluate models with fresh data using Tableau heat maps

When I build predictive models, I normally don’t build just one for each purpose. Presumably the model is going to be used, so I want it to be the best one possible. Yes, I test the model scores against a holdout data sample, but if I built only one model, I wouldn’t have anything solid on which to base my evaluation of the results. I might reject a lone model if it truly failed against the validation set, but that has never happened to me — even a lackluster performance can be better than nothing, and therefore the model is flawed, but useful. That statement is true of models in general. So testing results with nothing to compare against is pointless.

I usually produce one multiple linear regression model and one binary logistic regression model using the stats software package Data Desk. Many permutations are possible, though: The sample to be scored can be limited in various ways, and the dependent variable can be formulated any number of ways. The choice of technique (for me, one type of regression or another) is usually determined by the nature of the DV (though not always). Given unlimited time, I would produce multiple models, but doing two at a time is manageable and keeps the task of comparison simple. The model that does the best classifying the members of the holdout sample wins the prize, and the loser is discarded.

But there’s a problem. I’ve never had a model bomb when it comes to scoring the validation set, but I HAVE had models fail after deployment. Data that is held out for validation of the model is one thing — the real world outside the model can be a whole OTHER thing. Logically it should not be so: If the model doesn’t “know” anything about the holdout data, then you’d think its performance on it would indicate how it will perform in the future.

Not so. At least, not always.

I am not so quick, then, to discard the loser. I like to evaluate both models on fresh data as it comes in (new gifts, for example). The loser might be the better choice overall, or it might turn out that a combination of the two models performs better than one on its own. Maybe one model works better for a subset of the population (young alumni, say), which suggests that adding interaction terms or even using a multiple-model approach is something to consider in the future. If the models predict slightly different propensities (as a result of how the DVs were formulated), with both of them contributors to a desirable result, then it might be worthwhile keeping both score sets by multiplying them together.

I don’t have an extended period of time for such testing — the model needs to be put into operation before it gets stale. Unfortunately, evaluation has always been a cumbersome process. I need to query the database for fresh results (conversions, upgrades, new planned giving expectancies — whatever) and then match it up by ID and score for each model (scores for untested models are not going to be in the database, obviously), and then produce some charts in Excel to visualize and compare results. It’s not a ton of work, but it takes just long enough to prevent me from doing it more than once before it’s time to commit. Even if I am evaluating the models after the fact, in order to learn for the next iteration of model-building, it’s not an exercise I will want to carry out repeatedly.

There is a better way. Think reports.

What does a report do? A report pulls real-time (or nightly-refreshed) data and assembles it in an interpretable way in a tabular or visual display. It performs this service on a regular or semi-regular basis, or on-demand. (Yeah, okay, maybe I should have said an ideal report). If part of your job consists in report preparation as well as predictive modeling, then you should be building model scores into your reports.

Here’s a tutorial on how to use Tableau to easily create a report that compares the performance of two sets of model scores in a single visualization called a heat map. This visualization can be refreshed with live data as often as desired. If you want, you can add other fields (age, sex, degree, donor status, etc.) and easily filter the data to see how model performance differs depending on the composition of the population. Note that this is probably not a report you’ll be sharing with your vice president. It does look cool, but it is mainly a diagnostic and exploration tool for your own use. The small initial investment of time is worth it if you build multiple models — it can be reused again and again.

This tutorial assumes you’re already somewhat familiar with the basics of Tableau. If you don’t have the software, and you don’t want to download a free trial, stick around anyway — other software packages offer ways to create heat maps, and the basic idea is the same.

In this example, I am comparing percentile scores from two models I developed to predict which alumni are most likely to give at least $1,000 in the current fiscal year. One is a multiple linear regression model with a dependent variable defined as the sum of giving for the past five years (log-transformed). The other is a logistic regression model with a binary dependent variable defined as ‘has giving of at least $1,000 in any one of the past three years’. The exact definitions of the DVs are reasonable but somewhat arbitrary. They are closely related, but different. The techniques and the predictor variables are also different, so we should expect the models to yield different results. Tested against the validation set (which was the same for both models), the logistic model proved superior. But only a test on new gift data will be truly convincing.

I want to take the entire population of alumni whom I have scored (a sample of about 27,000 individuals), and match them up with what they have given since the model was created. In this made-up example, let’s suppose I created my models last August, and I want to see what those 27,000 alumni have given since the day I completed the work. In reality, I would have chosen a winning model months ago and this would be an after-the-fact analysis, but I am doing this in order to enrich the visualization for the purposes of this example. (Cheating, in other words.)

Tableau allows you to combine data from multiple sources. In this case, you will connect to an Excel file to get your model scores (since they’re not in the database), and then connect to your database for giving results since September 1. If you do not connect directly to your database from Tableau, then you can paste your gifts data into a second sheet in your Excel workbook and extract the data via a single connection to that file — no problem. The first worksheet will have three columns: One for unique ID, and one each for the scores from the two models. In this example, the scores were output from Data Desk as percentiles. If you want, you can add columns for key attributes such as age, sex and so on. The second worksheet (or the custom SQL that retrieves data directly from your data warehouse) will provide ID and sum of giving since September 1.

Normally in report creation, Tableau handles all the aggregation of the data — the input is raw transaction data, with each ID potentially appearing on multiple rows. In this example, however, we have aggregated the data already (summing giving by ID), and there is only one row of data for each ID. It doesn’t matter, but it might have implications for some of the specific steps that follow.

You should refer to your Tableau references for connecting to data sources. All I will add is that when you add the table (or worksheet) that contains the giving data, be sure to left-join on ID, because obviously not everyone you have scored has given since Sept. 1. From here on in, I will use Tableau terminology that won’t make any sense if you don’t know the software (specifically, Tableau Desktop version 7.0). Let’s build our first view:

  1. If your data has been extracted correctly, ‘ID’ will be listed under Dimensions, and your two model score sets will be listed under Measures. In this example, I will from now on refer to them as MLR (for Multiple Linear Regression) and Logistic. Obviously I’m referring to my own data — just try to replicate what I’m talking about using your own data file.
  2. For now, pause Auto Updates (or turn off automatic updates).
  3. Right-click on Logistic and select “Create bins …” This will bin the percentile score into whatever size we desire. Change the default bin size to 5 and click OK. Note that a new variable is created in the Dimensions pane, because bins are categorical, not numerical.
  4. Right-click on MLR and do the same thing.
  5. Drag Logistic (bins) to the Columns shelf. Drag MLR (bins) to the Rows shelf.
  6. Drag ID to the Text shelf. Click on the down-arrow of the ID pill you’ve just created, and select Measure –> Count. This will create a count of all IDs that fall into each cell. It turns green to indicate it’s now a measure instead of a dimension. (Because each ID appears in our data only once, it doesn’t matter whether we use either Count or Count Distinct.)
  7. Change the Marks type from Automatic to Square (right above the Text shelf). Notice that the Text shelf suddenly turns into a Label shelf — each square of the heat map will be labeled with the number of IDs.
  8. Drag ID from the Dimensions pane again, and this time drop it onto the Color shelf.
  9. Click on the down-arrow of the ID pill you’ve just created, and select Measure –> Count. This will base the color or shading of the cell on the number of IDs that fall into that cell.

The top left corner of your screen will look like this:

Now we’re ready to allow the view to automatically update. The result won’t look much like a heat map: Probably just a bunch of little squares with numbers beside them. We need to enlarge the squares. Under the Size shelf is a slider: Move this to the centre of the size range. Then drag one of the rows in the view to make it taller — hover over the axis for MLR (on the far left) until the pointer turns into an up-and-down arrow, then click and drag. When you let go, the squares will resize and the alleys of white space should start to close up. Keep messing with it until the squares touch on all sides. With a little formatting of labels for readability, the final product will look something like this. (Click on thumbnail image for full size.)

A heat map can convey a lot of information at a glance. You can immediately see where a lot of individuals are concentrated: They’re in the darkest squares. The numbers are hard to read, but up in the top left of the map, we see that the number of people who fall into the 0-4 bin in both the MLR and Logistic models is 572. In the lower right area of the map, we see that 563 people fell into the 95 to 99 bin in both models. Notice that Tableau didn’t bin evenly: Every single bin has 5 score levels in it except for the bin labeled 100, which contains only individuals with a score of 100. In the map, we see that 147 people scored exactly 100 in both models. This can be corrected (using a calculated field instead of automatic binning), but I have decided to leave it the way it is. Due to the nature of this modeling exercise, I am mainly interested in the top few percentile scores anyway, and the 100 group is of particular interest. Having them mapped separately from the rest is not a problem.

The names of the bins don’t reflect what they include. For example, “90” really means “90 to 94”. You can rename them using aliases. Right-click on Logistic in the Dimensions pane, select Field Properties –> Aliases…, and change the displayed values in the Values column. Do the same for MLR.

We haven’t looked at the recent-gift data yet, but before we move on, what can we learn from this view? It appears the models agree on the individuals with extremely high or extremely low scores. In the middle range, there is still a lot of agreement but also many more cases of divergence, in which an individuals scores high in one model but low in the other. This is clear, at-a-glance evidence that our models are similar but different. Depending on the application, choosing one model over the other could have a big effect on the result, for better or worse. In this particular application, where I am interested mainly in very high-scoring alumni only, it may not make that much difference at all … but let’s not jump to that conclusion just yet.

If your data set included some key grouping information such as age or sex, it might be interesting to create a filter to examine whether the models differ on those factors. Here’s an example with ‘Age’:

  1. Drag Age from the Measures pane into the Filters shelf.
  2. When Tableau asks you how you want to filter on Age, select “All Values” and click Next.
  3. On the next box, select Range of Values, and click OK.
  4. Hover over the green Age pill on the filters shelf, click the down-arrow on the right end of the pill, and select Show Quick Filter.

Now you can set the upper and lower age bounds of the individuals you want to be counted in the heat map. As you slide the scale, it will display Age with numbers after the decimal, even though your values are all whole numbers. If this bothers you, right-click on Age in the Measures pane, select Field Properties –> Number Format…, and click on Number (Custom). Adjust the number of decimal places to zero. Here’s what the quick filter looks like:

The next two images show the heat map for different age ranges. The first one is ages 20 to 50, the second is 51 to 80. Again, click on the thumbnails for full-size images — although the beauty of a heat map is that you can see the pattern from a distance.

Right off the bat, it’s evident that it’s harder for younger individuals to get a high score, but they fare better in the MLR model than they do in the Logistic model. Imagine a 45-degree line sloping from the top left corner to the bottom right corner — the presence of more dark-shaded squares under that line indicates individuals with higher MLR scores than Logistic scores. The logistic model, on the other hand, slightly favours older alumni. This alone might explain why the Logistic model outperformed the MLR model in terms of the validation set. The difference might be due to how age-related variables were introduced to each model as predictors; they may have been more influential in one than the other. It’s hard to say without going back to the models themselves for a close look.

One can spend a lot of time playing and learning with these filters. Let’s fast-forward and (finally) introduce recent-gift data — the giving that all scored individuals have engaged in since September 1, the day after the models were supposedly created. This data appears in the Measures pane as a variable I’ll call ‘Sum of Giving’. I’m specifically interested in who has given at least $1,000 (cumulatively), so I will need to create a calculated field to flag these people.

  1. Right-click on Sum of Giving and select Create Calculated Field…
  2. Give the field a name. I called it “Leadership donor”.
  3. The field Sum of Giving is already in the expression window. Now you just need to add some text around it to complete the expression:
  4. Click OK. This creates a field (variable) with the value 1 for any donor who has given at the Leadership level, and nothing if otherwise. Note that you can enter any amount in place of 999. If you want to count donors vs. non-donors, enter “>0”.
  5. The field appears in the Measures pane, because Tableau recognizes it as numeric. We’re using it as a categorical variable, so let’s convert it into a Dimension instead. Right-click on the field name and select “Convert to Dimension”, or simply drag the field into the Dimensions pane — both actions accomplish the same thing.

Now we have a flag we can use to zero in on our higher-end donors. Let’s create a new view for that. At the lower left of your screen, right-click on the tab for the existing view and select “Duplicate Sheet”. This will allow us to continue exploring the heat map without changing our original version. We could, of course, do all our work in a single view and use filters to dynamically alter the view — that’s one of the strengths of Tableau — but for now let’s keep our views separate.

  1. If you still have filters applied for Age or other variables, click on the quick filter menu and select “Clear Filter”. You can reapply it later if you want — we’re just getting it out of the way so we can see the full picture.
  2. Drag ‘Leadership donor’ to the Filters shelf.
  3. In the box that pops up, click “Select from list” on the General tab (it should already be selected), and then check the little box for ‘1’.
  4. Click OK.

The result looks like this. (Click for full size.)

Our big donors are clustered nicely down in the lower right corner, where both the MLR and the Logistic model scores are very high. Some of the lower-score bins contain zero Leadership donors, and Tableau has automatically hidden those rows and columns from view. Take a couple of minutes to study the map. Follow the three darkest squares (labeled 48, 74, and 23) as they form a 45-degree line up the centre of the map. If you compare the values in the squares that are directly opposite each other over this line, you’ll notice that there are slightly more Leadership donors on the upper side of the line. Those are donors who have higher Logistic scores than MLR scores. As well, notice that the scattered cloud of donors above the line is more extensive than that below the line. These observations should lead us to believe that the Logistic model performs slightly better than the MLR model.

That conclusion is a bit hasty, though. There might be more Leadership donors on the high-Logistic/low-MLR side simply because more alumni ended up in those squares in the first place. We need to calculate the PERCENTAGE of the population of each square that went on to become a Leadership donor. That’s right, we’re going to create a third view, and calculate percentages to plug into each square.

  1. Right-click on the tab for Sheet 2 and select Duplicate Sheet. (By the way, you can name these sheets whatever you want, just as in Excel.)
  2. Remove the filter for Leadership donor.
  3. Under Analysis in the top menu bar, select Create Calculated Field…
  4. Name the new field ‘Leadership percentage’.
  5. Enter this expression, which divides the number of Leadership donors by the total number of individuals.
  6. Click OK. The new field appears in the Measures pane, which is fine.
  7. Drag ‘Leadership percentage’ from Measures onto the Label shelf, replacing the count of ID.
  8. Drag ‘Leadership percentage’ from Measures again, this time onto the Color shelf.
  9. Right-click on any square in the map, and select Format…, which opens a formatting pane at the far left.
  10. On the Pane tab, in the Default section, click on the down-arrow to the right of “Numbers”, and select Percentage.

The result is below. (Click for full size.) You can select any precision for your percentages — I’ve rounded to whole numbers to avoid clutter.

The darkest square is a single donor with a very high MLR score but a very low Logistic score, who just happened to give at the Leadership level. That square is of course labeled 100%, which causes the rest of the display to be toned down to a degree that makes it hard to see the patterns. This single donor might be a person to look at more carefully, but for now, let’s exclude that person from the map. Hover your pointer over the square, and select Exclude from the tooltip box. (This creates a specific filter for this individual, which you can remove anytime.) All the squares are recoloured accordingly:

Now some of the darkest squares are also based on very sparse data. You can exclude any that you wish, but I’m fine with this display for now. For one thing, we can clearly see that having a Logistic score of 95 or higher is darn significant, regardless of what a donor’s MLR score is. For example, there are four Leadership donors who scored only 65-69 in the MLR model but have Logistic scores of 95-99, which is what we want to see. (Those donors are in the square labeled 14%.)

Being able to demonstrate that one model is superior is pretty nifty. But I am especially intrigued at how easy it is to see how the models might work together to improve accuracy.

Have a look at the square containing individuals who scored 100 in both models. There were 147 such individuals, and 48 of them gave $1,000 or greater — a whopping 32.6%. Here are a couple of facts to think about:

  • Of all the individuals who scored 100 in the Logistic model, 26.7% went on to give at the Leadership level.
  • Of all the individuals who scored 100 in the MLR model, 23.1% went on to give at the Leadership level.

Do you see what I’m getting at? When we combine both scores and zero in on people in the top percentile for both models, our yield of Leadership donors increases by nearly six percentage points over the best-performing model, to 32.6%.

The same boost is evident for other high-scoring cells in the heat map: The logistic model identifies some big donors that the MLR model misses, but the MLR model can enhance the accuracy of the logistic model. This is potentially useful for prospect identification in Major Giving, when we really want to be as focused as possible.

So far I’ve shown you only donor numbers. What about revenue? Our data set includes gift amounts, so let’s create a new view to visualize actual aggregate dollar totals.

  1. Duplicate the last sheet you created, and remove any filters that had been applied.
  2. Drag ‘Sum of Giving’ to the Label and Color shelves.
  3. Format the values as currency.
  4. For fun, change the color from green to red by clicking on Edit Colors in the context menu for the Sum of Giving card.

The result is pretty dramatic.

This is for all donors, not just Leadership donors, but if you want to narrow it down to Leadership donors only, re-apply your filter.

Just as with raw donor counts, the view above is a little misleading, simply because more prospects equals more donors, equals more dollars. So let’s create a calculated field to give us AVERAGE dollars per donor for every cell in the heat map.

The individuals with scores of 100 in both models gave nearly $5,000 on average — no other cell comes close. But guess what’s even better:

  • The individuals who scored 100 in the Logistic model gave an average of $2,927.
  • The individuals who scored 100 in the MLR model gave an average of $2,971.

The models are strongest where they intersect!

I’ve spent a lot of time and more than 4,000 words explaining how to do this in Tableau. This is very unusual for me — why a specific product such as Tableau, when one can create heat maps even in Excel? *

  • It’s just so easy to do it in Tableau, and the result looks attractive without requiring the user to fuss with formatting.
  • The data can be refreshed whenever necessary. If you’re connecting to an Excel file, simply paste new data into the file and refresh the data extract. It’s that simple. (Remember to refresh the extract rather than replace the data source entirely, if you want to retain your aliases as you’ve defined them.)
  • That goes for refreshing the giving data, AND for loading a whole different set of individuals and scores. You don’t need to rebuild these views from scratch (although it’s pretty easy to do so).
  • Tableau allows you to dynamically filter the data any which way you want. It’s a great way to explore the data. In my example, it would have been really interesting to filter on donors who UPGRADED to the $1,000+ level. Which model did a better job predicting upgrading? I don’t know, but I’m going to find out.
  • You can drill down to the underlying data. If you want to see a list of the people who scored 100 in both models, just hover the pointer over that square and click on the data icon, then the ‘Underlying’ tab. Imagine having wealth/capacity scores on one axis, and propensity scores on the other …
  • I’ve shared my heat maps here as static images, but you can share your analyses as fully-functioning views, even with people who don’t have the software on their computers. Save it as a Packaged Workbook, and they’ll be able to open it in Tableau Reader (which they can download for free). They can use the filters you’ve set up to play with the data themselves.

This may be the longest CoolData post ever, but as usual I feel I am barely scratching the surface.

* P.S.: Heat maps are easily created in a combination of Data Desk and Excel. Without going into too much detail: In Data Desk use contingency tables (a.k.a. cross tabs) to create the basic matrix of numbers, with one score set as x and the other as y, and use derived variable expressions to limit the counts as desired. Copy and paste the table text into Excel, and use conditional formatting to create the desired shading. Unfortunately this requires some fussing and the result is static.

Create a free website or blog at WordPress.com.