CoolData blog

7 June 2014

A fresh look at RFM scoring

Filed under: Annual Giving, John Sammis, Peter Wylie, RFM — Tags: , — kevinmacdonell @ 7:08 pm

Guest post by Peter B. Wylie and John Sammis

Back in February and March, Kevin MacDonell published a couple of posts about RFM for this blog (Automate RFM scoring of your donors with this Python script and An all-SQL way to automate RFM scoring). If you’ve read these, you know Kevin was talking about a quick way to amass the data you need to compute measures of RECENCY, FREQUENCY, and MONETARY AMOUNT for a particular set of donors over the last five fiscal years.

But how useful, really, is RFM? This short paper highlights some key issues with RFM scoring, but ends on a positive note. Rather than chucking it out the window, we suggest a new twist that goes beyond RFM to something potentially much more useful.

Download the PDF here: Why We Are Not in Love With RFM

25 March 2014

An all-SQL way to automate RFM scoring

Filed under: RFM, SQL — Tags: , , , — kevinmacdonell @ 8:44 pm

In my last post I described how to automate RFM scoring using the scripting language Python and pandas. Automating RFM scoring is a great way to learn a bit of Python — but on its own it’s a lousy reason for learning Python. It was pointed out to me that you don’t really need to call on the power of Python if you’re already comfortable with (or learning to use) SQL, and I agree.

Shortly after publishing the Python post, I got an email from Bruce Higgins, who works in the Alumni Affairs & Development office at Cornell University. His all-SQL solution is shown below. I leave it to readers to figure out how to make it work for them. In the wake of my post there was some great discussion and code-sharing on the Prospect-DMM discussion list, and I encourage you to look up that thread.

You’ll notice that Bruce has hard-coded the dollar values used for scoring “Monetary value” instead of dynamically dividing the file into quintiles. As he points out, if you’re scoring the same database over and over, it’s not likely these dollar thresholds are going to change much over time. You will of course need to make adjustments for your own donor data.

WITH Five_years AS
(SELECT id,
 SUM((gift_amount)) AS cash,
 MAX(decode(.gift_fiscalyear, 2014, 5, 2013, 4, 2012, 3, 2011, 2, 2010, 1)) AS recency,
 COUNT(DISTINCT(gift_fiscalyear)) AS frequency
 FROM
(SQL specifying the last five years cash transactions from our warehouse)
GROUP BY id
SELECT Five_years.id,
CASE
WHEN Five_years.cash >= 10000 THEN 5
WHEN Five_years.cash >= 2500 THEN 4
WHEN Five_years.cash >= 1000 THEN 3
WHEN Five_years.cash >= 500 THEN 2
ELSE 1
END + Five_years.recency + Five_years.frequency AS rfm_score
FROM Five_years
ORDER BY five_years.name

26 February 2014

Automate RFM scoring of your donors with this Python script

Filed under: Python, RFM — Tags: , , , — kevinmacdonell @ 9:20 pm

The next time you’re asked to refresh RFM scores for your donor base, tell them it’ll take you the better part of the day. Then fire up this script, output the scores in 30 seconds, and use the rest of your morning to work on something more interesting.

Unlike predictive modeling, which ideally requires an analyst’s good judgement in order to do well, RFM scoring lends itself to being completely automated. Based on a donor’s most recent five years of giving history, an RFM score is composed of three simple elements, which can each be scored 1 to 5. The highest score combination is 5-5-5.

  1. Recency: “5” if the donor’s most recent gift was last year, “4” if it was the year before that, and so on.
  2. Frequency: “5” if the donor gave in five out of five past years, “4” if the donor gave in four out of five, and so on.
  3. Monetary value: “5” if the donor’s total giving in the five years is in the top 20% of the donor file, “4” if total giving is in the next 20%, and so on.

This post follows naturally on my previous post, in which I showed how the PIVOT operator in an SQL statement can be used on donor data to arrange yearly giving totals into columns instead of rows, right at the level of the database query. (See Really swell SQL: Why you must know PIVOT and WITH.)

This Python script includes some sample SQL to pull giving data and pivot it — you will need to modify this SQL to match your schema and table names. If your database does not support PIVOT, then just pull the raw data and research how to pivot the data in Python. In fact, pivots and other data manipulations are much easier to do in Python than in SQL if you use a Python code library called pandas, designed specifically for working with data for analysis. Pandas has structures called DataFrames that are like database tables and are very intuitive to work with.

If you can’t connect directly to your database with this script, then you can just as easily read a .csv file into a DataFrame and work with that instead. The source file has to have one column for IDs, and five columns of yearly giving totals, with nulls for zero totals.

I am not going into a lot of detail about how each pandas function works. You can find as much information as you want via internet searches. (See another previous CoolData post, Getting bitten by Python.) I don’t promise that my code is elegant. Play with it, improve on it, and extend it to fit your needs.

In addition to comments in the body of the script below (anything after a “#”), here’s a limited explanation of what each section of the RFM calculation does:

RECENCY: The script creates five new columns. For each row of data, the field evaluates to True if the field is not null (i.e., if the donor has giving in the year being referenced). In Python, ‘True’ is the same as 1, and ‘False’ is the same as zero. If the donor had giving in the most recent year, the first new column will evaluate to True, and when I multiply ‘True’ by 5, the answer is 5. If no giving, the result is zero, and multiplying by 5 equals zero. The same goes for the next column: If the donor had giving in the year before, the result is True, and when I multiply by 4, the answer is 4. And so on, down to five years ago (‘1′). Then I create a sixth new column, which is populated by the maximum value found in the previous five columns — this ends up being the Recency portion of the score. Finally, I delete (or “drop”) the previous five columns, as they are no longer needed.

FREQUENCY: This part is way simpler … just a count of the number of non-null values in the five columns of yearly giving totals.

MONETARY VALUE: First, we sum on giving for all five years. Then these totals are chopped up into quintiles, ignoring any null totals. (Quintiles means five equal-sized groups, each making up 20% of the total number of records.) Each quintile is given a label (1 to 5), with the top quintile being a “5”.

That’s it. The results are saved to a .csv file, which looks like this:

output

 

 

# import required code modules.
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle   # for querying an Oracle database
import os  # for reading and writing files to your hard drive

# Prompt user for database username and password.
username = raw_input('Enter username: ')
password = raw_input('Enter password: ')

# Pass text of SQL statement to variable called SQL, enclosed in triple quotes.
# (Note the pivot ... five years of giving data.)

SQL = '''
SELECT *

FROM (
 SELECT TABLE_A.ID,
 SUM ( TABLE_B.GIFT_AMT ) AS GIVING,
 TABLE_B.GIFT_FISC_CODE AS FY

 FROM
 TABLE_A,
 TABLE_B

 WHERE
 (TABLE_A.ID = TABLE_B.ID
  AND TABLE_B.GIFT_FISC_CODE IN (2010, 2011, 2012, 2013, 2014))

 GROUP BY
 TABLE_A.ID,
 TABLE_B.GIFT_FISC_CODE
 )

PIVOT (
  SUM ( GIVING ) AS total FOR ( FY )
  IN (2010 as FY2010, 2011 as FY2011, 2012 as FY2012, 2013 as FY2013, 2014 as FY2014)
  )

'''

# Connect, execute the SQL, and put data into a pandas DataFrame (df)
# "@dsn" (data source name) is the TNS entry (from the Oracle names server or tnsnames.ora file)
# "0000" is the port number

connectstr = username + '/' + password + '@dsn:0000'
connection = cx_Oracle.connect(connectstr)
df = psql.frame_query(SQL, con=connection) 

## RFM score

# RECENCY
# Create five new columns.
df['Recency5'] = df.FY2014_TOTAL.notnull() * 5
df['Recency4'] = df.FY2013_TOTAL.notnull() * 4
df['Recency3'] = df.FY2012_TOTAL.notnull() * 3
df['Recency2'] = df.FY2011_TOTAL.notnull() * 2
df['Recency1'] = df.FY2010_TOTAL.notnull() * 1

# Create a sixth column, setting it to the max value of the previous five columns.
df['Recency'] = df[['Recency5', 'Recency4', 'Recency3', 'Recency2', 'Recency1']].max(axis=1)

# Delete the five original columns - no longer needed.
df = df.drop(['Recency5', 'Recency4', 'Recency3', 'Recency2', 'Recency1'], axis=1)

# FREQUENCY

# Create new column, and set it to the number of non-null values in the giving totals columns.
df['Frequency'] = df[['FY2010_TOTAL', 'FY2011_TOTAL', 'FY2012_TOTAL', 'FY2013_TOTAL', 'FY2014_TOTAL']].count(axis=1)

# MONETARY VALUE

# Create new column, the sum of all giving years.
df['Giving_Total'] = df[['FY2010_TOTAL', 'FY2011_TOTAL', 'FY2012_TOTAL', 'FY2013_TOTAL', 'FY2014_TOTAL']].sum(axis=1, skipna=True)

# Break all giving totals into five quintiles (ignoring any null values), and store the label (1 to 5) in another new column.
quintile_bins = [1, 2, 3, 4, 5]
df['Monetary'] = pd.qcut(df['Giving_Total'], 5, labels = quintile_bins)

# Write results to a .csv file, and display number of saved records to user.
df.to_csv('RFM_results.csv')
print(str(len(df)) + " records output")

13 September 2012

Odd but true findings? Upgrading annual donors are “erratic” and “volatile”

Filed under: Annual Giving, Prospect identification, RFM — Tags: , , , , — kevinmacdonell @ 8:26 am

In Annual Fund, Leadership giving typically starts at gifts of $1,000 (at least in Canada it does). For most schools, these donors make up a minority of all donors, but a majority of annual revenue. They are important in their own right, and for delivering prospects to Major Giving. Not surprising, then, that elevating donors from entry-level giving to the upper tiers of the Annual Fund is a common preoccupation.

It has certainly been mine. I’ve spent considerable time studying where Leadership donors come from, in terms of how past behaviours potentially signal a readiness to enter a new level of support. Some of what I’ve learned seems like common sense. Other findings strike me as a little weird, yet plausible. I’d like to share some of the weird insights with you today. Although they’re based on data from a single school, I think they’re interesting enough to merit your trying a similar study of donor behaviour.

First, some things I learned which you probably won’t find startling:

  • New Leadership donors tend not to come out of nowhere. They have giving histories.
  • Their previous giving is usually recent, and consists of more than one or two years of giving.
  • Usually those gifts are of a certain size. Many donors giving at the $1,000 level for the first time gave at least $500 the previous year. Some gave less than that, but $500 seems to be an important threshold.

In short, it’s all about the upgrade: Find the donors who are ready to move up, and you’re good to go. But who are those donors? How do you identify them?

It would be reasonable to suggest that you should focus on your most loyal donors, and that RFM scoring might be the way to go. I certainly thought so. Everyone wants high retention rates and loyal donors. Just like high-end donors, people who give every year are probably your program’s bread and butter. They have high lifetime value, they probably give at the same time of year (often December), and they are in tune with your consistent yearly routine of mailings and phone calls. Just the sort of donor who will have a high RFM score. So what’s the problem?

The problem was described at a Blackbaud annual fund benchmarking session I attended this past spring: Take a hard look at your donor data, they said, and you’ll probably discover that the longer a donor has given at a certain level, the less likely she is to move up. She may be loyal, but if she plateaued years ago at $100 or $500 per year, she’s not going to respond to your invitation to join the President’s Circle, or whatever you call it.

Working with this idea that donor  loyalty can equate to donor inertia, I looked for evidence of an opposite trait I started calling “momentum.” I defined it as an upward trajectory in giving year over year, hopefully aimed at the Leadership level. I pulled a whole lot of data: The giving totals for each of the past seven years for every Annual Fund donor. I tried various methods for characterizing the pattern of each donor’s contributions over time. I wanted to calculate a single number that represented the slope and direction of each donor’s path: Trending sharply up, or somewhat up, staying level, trending somewhat down, or sharply down.

I worked with that concept for a while. A long while. I think people got sick of me talking about “momentum.”

After many attempts, I had to give up. The formulas I used just didn’t seem to give me anything useful to sum up the variety of patterns out there. So I tried studying some giving scenarios, based on whether or not a donor gave in a given year. As you might imagine, the number of possible likely scenarios quickly approached the level of absurdity. I actually wrote this sentence: “What % of donors with no giving Y1-Y4, but gave in Y5 and did not give in Y6 upgraded from Y5 to Y7?” It was at that point that my brain seized up. I cracked a beer and said to hell with that.

I tried something new. For each donor, I converted their yearly giving totals into a flag that indicated whether they had giving in a particular year or not: Y for yes, N for no. Imagine an Excel file with seven columns full of Ys and Ns, going on for thousands of rows, one row per donor. Then I concatenated the first six columns of Y/Ns. A donor who gave every year ended up with the string “YYYYYY”. A donor who gave every second year looked like “YNYNYN” — and so on.

I called these strings “donor signatures” — sort of a fingerprint of their giving patterns over six years. Unlike a fingerprint, though, these signatures were not unique to the individual. The 15,000 donors in my data file fit into just 64 signatures.

A-ha, now I was getting somewhere. I had set aside the final year of giving data — year seven — which I could use to determine whether a donor had upgraded, downgraded or stayed the same. All I had to do was take those 64 categories of donors and rank them by the percentage of donors who had upgraded in the final year. Then I could just eyeball the sorted signatures and see if I could detect any patterns in the signatures that most often led to the upgrading behaviours I was looking for. (This is much easier done in stats software than in Excel, by the way.)

All of the following observations are based on the giving patterns of donors who gave in the final two years, which allowed me to compare whether they upgraded or not. This cut out many possible scenarios (eg., donors who didn’t give in one of those two years), but it was a good starting point.

I confirmed that the more years a donor has given, the more likely they are to be retained. BUT:

  • The more previous years a donor has given consecutively, the LESS likely they are to upgrade if they give again.
  • A donor is markedly more likely to upgrade from the prior year if they have lapsed at least one year prior to giving again.
  • Specifically, they are most likely to upgrade if they have one, two or three years with giving in the previous five. More than that, and they are becoming more loyal, and therefore less likely to upgrade.
  • Donors who give every other year, or who have skipped up to two years at a time, are most likely to upgrade from last year to the current year.

I told you it was counter-intuitive. If it was just all obvious common sense, we wouldn’t need data analysis. Here’s more odd stuff:

  • In general, the same qualities that make a donor more likely to upgrade also make a donor upgrade by a higher amount.
  • By far, the highest-value upgrader is a last-year donor who lapsed the previous year but had three years of giving in the previous five.
  • The next-highest donor signatures all show combinations of repeated giving and lapsing.
  • As a general rule, the highest-value upgraders have about an equal number of years as a donor and as a non-donor.

The conclusion? Upgrade potential can be a strangely elusive quality. From this analysis it appears that being a frequent donor (three or four years out of the past six) is a positive, but only if those years are broken up by the odd non-giving year. In other words, the upgrading donor is also something of an erratic donor.

I thought that was a pretty nifty phenomenon to bring to light. I decided to augment it by trying another, similar approach. Instead of flagging the simple fact of having given or not given in a particular year, this time I flagged whether a donor had upgraded from one year to the next.

Again I worked with seven fiscal years of giving data. I was interested in the final year – year seven – setting that as the “result” of the previous six years of giving behaviour. I was interested only in people who gave that year, AND who had some previous giving in years 1 to 6. The result set consisted of “Gave same or less” or “Upgrade”, and if upgrade, the average dollar upgrade.

The flags were a little more complicated than Y/N. I used ‘U’ to denote an upgrade from the year previous, ‘S’ to denote giving at the same level as the year previous, ‘D’ for a downgrade, and ‘O’ (for “Other”) if no comparison was possible (i.e., one or both years had no giving). Each signature had five characters instead of six, since it’s not possible to assign a code to the first year (no previous year of giving in the data to compare with).

This time there were 521 signatures, which made interpretation much more difficult. Many signatures had fewer than five donors, and only a dozen or so contained more than 100 donors. But when I counted the number of upgrades, downgrades and “sames” that a donor had in the previous five years, and then looked at how they behaved in the final year, some clear patterns did emerge:

  • Donors who upgraded two or more times in the past were most likely to upgrade again in the current year, and the size of their upgrade was larger, than donors who upgraded fewer times, or never upgraded. Upgrade likelihood was highest if the donor had upgraded at least four times in the previous five years.
  • Donors who gave the same amount every year were the least likely to upgrade — this is the phenomenon people were talking about at the benchmarking meeting I mentioned earlier. Donors who never gave the same amount from one year to the next, or did so only once, had higher median upgrade amounts.
  • And finally, the number of downgrades … this paints a strongly counter-intuitive picture. The more previous downgrades a donor had, the more likely they were to upgrade in the current year!

In other words, along with being erratic, donors who upgrade also have the characteristic that I started to call volatility.

I wondered what the optimum mix of upgrades and downgrades might be, so I created a variable called “Upgrades minus Downgrades”, which calculated the difference only for donors who had at least one upgrade or downgrade. The variable ranged from -4 (lots of downgrades) to plus 5 (lots of upgrades). What I discovered is that it’s not a balance that is important, but that a donor be at one extreme or the other. The more extreme the imbalance, the more likely an upgrade will occur, and the larger it will be, on average.

ERRATIC and VOLATILE … two qualities you’ve probably never ascribed to your most generous donors. But there it is: Your best prospects for an ambitious ask (perhaps a face-to-face one) might be the ones who are inconsistent about the amounts they give, and who don’t care to give every year.

By all means continue to use RFM to identify the core of your top supporters, but be aware that this approach will not isolate the kind of rogue donors I’m talking about. You can use donor signatures, as I have, to explore the extent to which this phenomenon prevails in your own donor database. From there, you might wish to capture these behaviours as input variables for a proper predictive model.

At worst, you’ll be soliciting donors who will never become loyal, and who may not have lifetime values that are as attractive as our less flashy, but more dependable, loyal donors. On the other hand, if you put a bigger ask in front of them and they go for it, they may eventually enter the realm of major giving. And then it will all be worth it.

27 January 2011

RFM: Not a substitute for predictive modeling

Filed under: RFM, Why predictive modeling? — Tags: , — kevinmacdonell @ 9:06 am

Recency, Frequency, Monetary value. The calculation of scores based on these three transactional variables has a lot of sway over the minds of fundraisers, and I just don’t understand why.

It’s one of those concepts that never seems to go away. Everyone wants a simple way to whip up an RFM score. Yet anyone who can do a good job of RFM is probably capable of doing real predictive modeling. The persistence of RFM seems to rest on some misconceptions, which I want to address today.

First, people are under the impression that RFM is cutting-edge. It isn’t. In his book, “Fundraising Analytics: Using Data to Guide Strategy,” Joshua Birkholz points out that RFM is “one of the most common measures of customer value in corporate America.” It’s been around a long time. That alone doesn’t mean it’s invalid — it just isn’t anything new, even in fundraising.

Second, it’s often misconstrued as a predictive tool, and therefore the best way to segment a prospect pool. It’s not. As Josh makes clear in his book, RFM has always been a measure of customer value. It does not take personal affinities into account, nor any non-purchasing activities, he writes.

Note the language. RFM is borrowed from the for-profit world: retail and sales. Again, this doesn’t discredit it, but it does make it inappropriate as the sole or primary tool for prediction. Because it’s purely transactional in nature, all RFM can tell you is that donors will become donors. It CAN’T tell you which non-donors are most likely to be acquired as new donors.  The RFM score for a non-donor is always ZERO.

It also can’t tell you which lapsed donors are most likely to be reactivated, or which donors are most likely to be upgraded to higher levels of giving. In the eyes of RFM, one person who gave $50 last year is exactly the same as any other person who gave $50 last year. They’re NOT.

Third, we’re often told that RFM is easy to do. RFM is easy to explain and understand. It’s not necessarily easy to do. Recency and Monetary Value are straightforward, but Frequency requires a number of steps and calculations, and you’re probably not going to do it in Excel. Josh himself says it’s easy to do, but the demonstration in his book requires SPSS. If you’re using a statistics software package such as SPSS and you’ve mastered Frequency, then true predictive modeling is easily within your grasp. Almost all the variables I use in my models are simpler to derive than Frequency.

Is RFM useless? No, but we need to learn not to pick up a hammer when what we really need is a saw. RFM is for ranking existing donors according to their value to your organization, based on their past history of giving. Predictive modeling is for predicting (who knew?), and answering the three hard questions I listed above (acquisition, reactivation, upgrade potential.)

You could, in fact, use both. Your predictive model might identify the top 10% of your donor constituency who are most likely to renew, while your RFM score set will inform you who in that top 10% have the highest potential value to your organization. A matrix with affinity (predictive model) on one axis and value on the other (RFM) would make a powerful tool for, say, segmenting an Annual Giving donor pool for Leadership giving potential. Just focus on the quadrant of donors who have high scores for both affinity and value.

If you want to use RFM in that way (that is, properly), then fill your boots. I recommend Josh Birkholz’s book, because he lays it out very clearly.

The real danger in RFM is that it can become an excuse for not collecting more and better data.

For any institution of higher education, the idea that RFM is the bee’s knees is patently untrue. Institutions with alumni have a trove of variables that are informative about ALL of their constituency, not just those who happen to be donors. Expand that to arts-based nonprofits, and you’ll find member-based constituencies and the very same opportunities to model in a donor/non-donor environment. Neither of these types of institutions should be encouraged to rely exclusively on RFM.

For the rest, who don’t have the data for their constituency but could, the idea that pure donor transaction data is all you need cuts off the chance of doing the work now to get more sophisticated about collecting and organizing data that will pay off in the years ahead.

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,086 other followers