CoolData blog

13 June 2016

Nifty SQL regression to calculate donors’ giving trends

Filed under: Coolness, Predictor variables, regression, SQL — Tags: , , , — kevinmacdonell @ 8:28 pm

 

Here’s a nifty bit of SQL that calculates a best-fit line through a donor’s years of cash-in giving by fiscal year (ignoring years with no giving), and classifies that donor in terms of how steeply they are “rising” or “falling”.

 

I’ll show you the sample code, which you will obviously have to modify for your own database, and then talk a little bit about how I tested it. (I know this works in Oracle version 11g. Not sure about earlier versions, or other database systems.)

 

with sums AS (
 select t1.id, t1.fiscal_year, log(10, sum(t1.amount)) AS yr_sum
 from gifts t1
 group by t1.id, t1.fiscal_year),

slopes AS (
 select distinct
 sums.id,
 regr_slope(sums.yr_sum,sums.fiscal_year) OVER (partition by sums.id) AS slope

from sums
 )

select
 slopes.id,
 slopes.slope,
 CASE
 when slopes.slope is null then 'Null'
 when slopes.slope >=0.1 then 'Steeply Rising'
 when slopes.slope >=0.05 then 'Moderately Rising'
 when slopes.slope >=0.01 then 'Slightly Rising'
 when slopes.slope >-0.01 then 'Flat'
 when slopes.slope >-0.05 then 'Slightly Falling'
 when slopes.slope >-0.1 then 'Moderately Falling'
 else 'Steeply Falling' end AS description

from slopes
That’s it. Not a lot of SQL, and it runs very quickly (for me). But does it actually tell us anything?

 

I devised a simple test. Adapting this query, I calculated the “slope of giving” for all donors over a five-year period in the past: FY 2007 to FY 2011. I wanted to see if this slope could predict whether, and by how much, a donor’s giving would rise or fall in the next five-year period: FY 2012 to FY 2016. (Note that the sum of a donor’s giving in each year is log-transformed, in order to better handle outlier donors with very large giving totals.)

 

I assembled a data file with each donor’s sum of cash giving for the first five-year period, the slope of their giving in that period, and the sum of their cash giving for the five-year period after that.

 

The first test was to see how the categories of slope, from Steeply Rising to Steeply Falling, translated into subsequent rises and falls. In Data Desk, I compared the two five-year periods. If the second period’s giving was greater than the first, I called that a “rise.” If it was less, I called it a “fall.” And if it was exactly the same, I called it “Same.”

 

The table below summarizes the results. Note that these numbers are all percentages, summed horizontally. (I will explain the colour highlighting later on.)

 

contingency1

 

For Steeply Rising, 60.6% of donors actually FELL from the first period to the next. Only 37.8 percent rose, and just 1.6% stayed exactly the same. Not terribly impressive. Look at Steeply Falling, though: More than three-quarters actually did fall. That’s a better result, but then again, “Falling” dominates for every category; in the whole file, close to 70% of all donors reduced their giving in the next period. If a donor has no giving in the second period of five years, that’s zero dollars given, and this is called a “Fall” — more on that aspect in just a sec.

 

(I’ve left out donors with a FY2007-11 slope of Null — they’re the ones who gave in only one year and therefore don’t have a “slope”.)

 

Let’s not give up just yet, however. The colour highlighting indicates how high each percentage value is in relation to those above and below it. For example, the highest percentages in the Falling column are found in the Slightly, Moderately, and especially Steeply Falling slope categories. The highest percentages in the Rising column are in the Slightly, Moderately, and Steeply Rising slope categories. And in the Same column, the Flat slope wins hands-down — as we would hope.

 

So a rising slope “sort of” predicts increased giving, a falling slope “sort of” predicts decreased giving. Unfortunately, many donors are not retained into the second five-year period, so there’s not a lot to be confident about.

 

But what if a donor IS retained? What if we exclude the lapsed donors entirely? Let’s do that:

 

contingency2

 

Excluding non-donors seems to lead to an improvement … The slope does a better job sorting between the risers and fallers when a donor is actually retained. Again, the colour highlighting is referencing columns, not rows. But notice now that, across the rows, Rising has a slight majority for the Rising slope categories, and Falling has a slight majority for the Falling slope categories. (The bar is set too high for Flat, however, given that a donor’s giving in the first five years has to be exactly equal to her giving in the second five years to be called Same.)

 

Admittedly, these majorities are not generous. If I calculated a donor’s slope of giving as Steeply Rising and that donor was retained, I have only a 56.4% chance of actually being right. And of course there’s no guarantee that donor won’t lapse.

 

(Note that these are donors of all types — alumni, non-alumni individuals, and entities such as corporations and foundations. Non-alumni donors tend not to have patterns in their giving that are repeated, not to the extent that alumni do. However, when I limit the data file to alumni donors only, the improvement in this method is only very slight.)

 

Pressing on … I did a regression analysis using total giving in the second five-year period as the dependent variable, then entered total giving in the prior five-year period as an independent variable. (Naturally, R-squared was very high.) This allowed me to see if Slope provides any explanatory power when it is added as the second independent variable — the effect of giving in the first five-year period already being accounted for.

 

And the answer is, yes, it does. But only under specific conditions: Both five-year giving totals were log-transformed and, most significantly, donors who did not give in the second period were excluded from the regression.

 

There are other way to assess the usefulness of “slope” which might lead to an application, and I encourage you to give this a try with your own data. From past experience I know that donors who make big upgrades in giving don’t have any neat universal pattern such as an upward slope in their giving history. (The concept of volatility is explored here and here.) “Slope” is probably too simple a characteristic to employ on its own.

 

But as I’ve said before, if it were easy, obvious, or intuitive, it wouldn’t be data analysis.

 

Advertisements

13 September 2012

Odd but true findings? Upgrading annual donors are “erratic” and “volatile”

Filed under: Annual Giving, Prospect identification, RFM — Tags: , , , , — kevinmacdonell @ 8:26 am

In Annual Fund, Leadership giving typically starts at gifts of $1,000 (at least in Canada it does). For most schools, these donors make up a minority of all donors, but a majority of annual revenue. They are important in their own right, and for delivering prospects to Major Giving. Not surprising, then, that elevating donors from entry-level giving to the upper tiers of the Annual Fund is a common preoccupation.

It has certainly been mine. I’ve spent considerable time studying where Leadership donors come from, in terms of how past behaviours potentially signal a readiness to enter a new level of support. Some of what I’ve learned seems like common sense. Other findings strike me as a little weird, yet plausible. I’d like to share some of the weird insights with you today. Although they’re based on data from a single school, I think they’re interesting enough to merit your trying a similar study of donor behaviour.

First, some things I learned which you probably won’t find startling:

  • New Leadership donors tend not to come out of nowhere. They have giving histories.
  • Their previous giving is usually recent, and consists of more than one or two years of giving.
  • Usually those gifts are of a certain size. Many donors giving at the $1,000 level for the first time gave at least $500 the previous year. Some gave less than that, but $500 seems to be an important threshold.

In short, it’s all about the upgrade: Find the donors who are ready to move up, and you’re good to go. But who are those donors? How do you identify them?

It would be reasonable to suggest that you should focus on your most loyal donors, and that RFM scoring might be the way to go. I certainly thought so. Everyone wants high retention rates and loyal donors. Just like high-end donors, people who give every year are probably your program’s bread and butter. They have high lifetime value, they probably give at the same time of year (often December), and they are in tune with your consistent yearly routine of mailings and phone calls. Just the sort of donor who will have a high RFM score. So what’s the problem?

The problem was described at a Blackbaud annual fund benchmarking session I attended this past spring: Take a hard look at your donor data, they said, and you’ll probably discover that the longer a donor has given at a certain level, the less likely she is to move up. She may be loyal, but if she plateaued years ago at $100 or $500 per year, she’s not going to respond to your invitation to join the President’s Circle, or whatever you call it.

Working with this idea that donor  loyalty can equate to donor inertia, I looked for evidence of an opposite trait I started calling “momentum.” I defined it as an upward trajectory in giving year over year, hopefully aimed at the Leadership level. I pulled a whole lot of data: The giving totals for each of the past seven years for every Annual Fund donor. I tried various methods for characterizing the pattern of each donor’s contributions over time. I wanted to calculate a single number that represented the slope and direction of each donor’s path: Trending sharply up, or somewhat up, staying level, trending somewhat down, or sharply down.

I worked with that concept for a while. A long while. I think people got sick of me talking about “momentum.”

After many attempts, I had to give up. The formulas I used just didn’t seem to give me anything useful to sum up the variety of patterns out there. So I tried studying some giving scenarios, based on whether or not a donor gave in a given year. As you might imagine, the number of possible likely scenarios quickly approached the level of absurdity. I actually wrote this sentence: “What % of donors with no giving Y1-Y4, but gave in Y5 and did not give in Y6 upgraded from Y5 to Y7?” It was at that point that my brain seized up. I cracked a beer and said to hell with that.

I tried something new. For each donor, I converted their yearly giving totals into a flag that indicated whether they had giving in a particular year or not: Y for yes, N for no. Imagine an Excel file with seven columns full of Ys and Ns, going on for thousands of rows, one row per donor. Then I concatenated the first six columns of Y/Ns. A donor who gave every year ended up with the string “YYYYYY”. A donor who gave every second year looked like “YNYNYN” — and so on.

I called these strings “donor signatures” — sort of a fingerprint of their giving patterns over six years. Unlike a fingerprint, though, these signatures were not unique to the individual. The 15,000 donors in my data file fit into just 64 signatures.

A-ha, now I was getting somewhere. I had set aside the final year of giving data — year seven — which I could use to determine whether a donor had upgraded, downgraded or stayed the same. All I had to do was take those 64 categories of donors and rank them by the percentage of donors who had upgraded in the final year. Then I could just eyeball the sorted signatures and see if I could detect any patterns in the signatures that most often led to the upgrading behaviours I was looking for. (This is much easier done in stats software than in Excel, by the way.)

All of the following observations are based on the giving patterns of donors who gave in the final two years, which allowed me to compare whether they upgraded or not. This cut out many possible scenarios (eg., donors who didn’t give in one of those two years), but it was a good starting point.

I confirmed that the more years a donor has given, the more likely they are to be retained. BUT:

  • The more previous years a donor has given consecutively, the LESS likely they are to upgrade if they give again.
  • A donor is markedly more likely to upgrade from the prior year if they have lapsed at least one year prior to giving again.
  • Specifically, they are most likely to upgrade if they have one, two or three years with giving in the previous five. More than that, and they are becoming more loyal, and therefore less likely to upgrade.
  • Donors who give every other year, or who have skipped up to two years at a time, are most likely to upgrade from last year to the current year.

I told you it was counter-intuitive. If it was just all obvious common sense, we wouldn’t need data analysis. Here’s more odd stuff:

  • In general, the same qualities that make a donor more likely to upgrade also make a donor upgrade by a higher amount.
  • By far, the highest-value upgrader is a last-year donor who lapsed the previous year but had three years of giving in the previous five.
  • The next-highest donor signatures all show combinations of repeated giving and lapsing.
  • As a general rule, the highest-value upgraders have about an equal number of years as a donor and as a non-donor.

The conclusion? Upgrade potential can be a strangely elusive quality. From this analysis it appears that being a frequent donor (three or four years out of the past six) is a positive, but only if those years are broken up by the odd non-giving year. In other words, the upgrading donor is also something of an erratic donor.

I thought that was a pretty nifty phenomenon to bring to light. I decided to augment it by trying another, similar approach. Instead of flagging the simple fact of having given or not given in a particular year, this time I flagged whether a donor had upgraded from one year to the next.

Again I worked with seven fiscal years of giving data. I was interested in the final year – year seven – setting that as the “result” of the previous six years of giving behaviour. I was interested only in people who gave that year, AND who had some previous giving in years 1 to 6. The result set consisted of “Gave same or less” or “Upgrade”, and if upgrade, the average dollar upgrade.

The flags were a little more complicated than Y/N. I used ‘U’ to denote an upgrade from the year previous, ‘S’ to denote giving at the same level as the year previous, ‘D’ for a downgrade, and ‘O’ (for “Other”) if no comparison was possible (i.e., one or both years had no giving). Each signature had five characters instead of six, since it’s not possible to assign a code to the first year (no previous year of giving in the data to compare with).

This time there were 521 signatures, which made interpretation much more difficult. Many signatures had fewer than five donors, and only a dozen or so contained more than 100 donors. But when I counted the number of upgrades, downgrades and “sames” that a donor had in the previous five years, and then looked at how they behaved in the final year, some clear patterns did emerge:

  • Donors who upgraded two or more times in the past were most likely to upgrade again in the current year, and the size of their upgrade was larger, than donors who upgraded fewer times, or never upgraded. Upgrade likelihood was highest if the donor had upgraded at least four times in the previous five years.
  • Donors who gave the same amount every year were the least likely to upgrade — this is the phenomenon people were talking about at the benchmarking meeting I mentioned earlier. Donors who never gave the same amount from one year to the next, or did so only once, had higher median upgrade amounts.
  • And finally, the number of downgrades … this paints a strongly counter-intuitive picture. The more previous downgrades a donor had, the more likely they were to upgrade in the current year!

In other words, along with being erratic, donors who upgrade also have the characteristic that I started to call volatility.

I wondered what the optimum mix of upgrades and downgrades might be, so I created a variable called “Upgrades minus Downgrades”, which calculated the difference only for donors who had at least one upgrade or downgrade. The variable ranged from -4 (lots of downgrades) to plus 5 (lots of upgrades). What I discovered is that it’s not a balance that is important, but that a donor be at one extreme or the other. The more extreme the imbalance, the more likely an upgrade will occur, and the larger it will be, on average.

ERRATIC and VOLATILE … two qualities you’ve probably never ascribed to your most generous donors. But there it is: Your best prospects for an ambitious ask (perhaps a face-to-face one) might be the ones who are inconsistent about the amounts they give, and who don’t care to give every year.

By all means continue to use RFM to identify the core of your top supporters, but be aware that this approach will not isolate the kind of rogue donors I’m talking about. You can use donor signatures, as I have, to explore the extent to which this phenomenon prevails in your own donor database. From there, you might wish to capture these behaviours as input variables for a proper predictive model.

At worst, you’ll be soliciting donors who will never become loyal, and who may not have lifetime values that are as attractive as our less flashy, but more dependable, loyal donors. On the other hand, if you put a bigger ask in front of them and they go for it, they may eventually enter the realm of major giving. And then it will all be worth it.

Create a free website or blog at WordPress.com.