CoolData blog

7 June 2014

A fresh look at RFM scoring

Filed under: Annual Giving, John Sammis, Peter Wylie, RFM — Tags: , — kevinmacdonell @ 7:08 pm

Guest post by Peter B. Wylie and John Sammis

Back in February and March, Kevin MacDonell published a couple of posts about RFM for this blog (Automate RFM scoring of your donors with this Python script and An all-SQL way to automate RFM scoring). If you’ve read these, you know Kevin was talking about a quick way to amass the data you need to compute measures of RECENCY, FREQUENCY, and MONETARY AMOUNT for a particular set of donors over the last five fiscal years.

But how useful, really, is RFM? This short paper highlights some key issues with RFM scoring, but ends on a positive note. Rather than chucking it out the window, we suggest a new twist that goes beyond RFM to something potentially much more useful.

Download the PDF here: Why We Are Not in Love With RFM

25 March 2014

An all-SQL way to automate RFM scoring

Filed under: RFM, SQL — Tags: , , , — kevinmacdonell @ 8:44 pm

In my last post I described how to automate RFM scoring using the scripting language Python and pandas. Automating RFM scoring is a great way to learn a bit of Python — but on its own it’s a lousy reason for learning Python. It was pointed out to me that you don’t really need to call on the power of Python if you’re already comfortable with (or learning to use) SQL, and I agree.

Shortly after publishing the Python post, I got an email from Bruce Higgins, who works in the Alumni Affairs & Development office at Cornell University. His all-SQL solution is shown below. I leave it to readers to figure out how to make it work for them. In the wake of my post there was some great discussion and code-sharing on the Prospect-DMM discussion list, and I encourage you to look up that thread.

You’ll notice that Bruce has hard-coded the dollar values used for scoring “Monetary value” instead of dynamically dividing the file into quintiles. As he points out, if you’re scoring the same database over and over, it’s not likely these dollar thresholds are going to change much over time. You will of course need to make adjustments for your own donor data.

WITH Five_years AS
(SELECT id,
 SUM((gift_amount)) AS cash,
 MAX(decode(.gift_fiscalyear, 2014, 5, 2013, 4, 2012, 3, 2011, 2, 2010, 1)) AS recency,
 COUNT(DISTINCT(gift_fiscalyear)) AS frequency
 FROM
(SQL specifying the last five years cash transactions from our warehouse)
GROUP BY id
SELECT Five_years.id,
CASE
WHEN Five_years.cash >= 10000 THEN 5
WHEN Five_years.cash >= 2500 THEN 4
WHEN Five_years.cash >= 1000 THEN 3
WHEN Five_years.cash >= 500 THEN 2
ELSE 1
END + Five_years.recency + Five_years.frequency AS rfm_score
FROM Five_years
ORDER BY five_years.name

26 February 2014

Automate RFM scoring of your donors with this Python script

Filed under: Python, RFM — Tags: , , , — kevinmacdonell @ 9:20 pm

The next time you’re asked to refresh RFM scores for your donor base, tell them it’ll take you the better part of the day. Then fire up this script, output the scores in 30 seconds, and use the rest of your morning to work on something more interesting.

Unlike predictive modeling, which ideally requires an analyst’s good judgement in order to do well, RFM scoring lends itself to being completely automated. Based on a donor’s most recent five years of giving history, an RFM score is composed of three simple elements, which can each be scored 1 to 5. The highest score combination is 5-5-5.

  1. Recency: “5” if the donor’s most recent gift was last year, “4” if it was the year before that, and so on.
  2. Frequency: “5” if the donor gave in five out of five past years, “4” if the donor gave in four out of five, and so on.
  3. Monetary value: “5” if the donor’s total giving in the five years is in the top 20% of the donor file, “4” if total giving is in the next 20%, and so on.

This post follows naturally on my previous post, in which I showed how the PIVOT operator in an SQL statement can be used on donor data to arrange yearly giving totals into columns instead of rows, right at the level of the database query. (See Really swell SQL: Why you must know PIVOT and WITH.)

This Python script includes some sample SQL to pull giving data and pivot it — you will need to modify this SQL to match your schema and table names. If your database does not support PIVOT, then just pull the raw data and research how to pivot the data in Python. In fact, pivots and other data manipulations are much easier to do in Python than in SQL if you use a Python code library called pandas, designed specifically for working with data for analysis. Pandas has structures called DataFrames that are like database tables and are very intuitive to work with.

If you can’t connect directly to your database with this script, then you can just as easily read a .csv file into a DataFrame and work with that instead. The source file has to have one column for IDs, and five columns of yearly giving totals, with nulls for zero totals.

I am not going into a lot of detail about how each pandas function works. You can find as much information as you want via internet searches. (See another previous CoolData post, Getting bitten by Python.) I don’t promise that my code is elegant. Play with it, improve on it, and extend it to fit your needs.

In addition to comments in the body of the script below (anything after a “#”), here’s a limited explanation of what each section of the RFM calculation does:

RECENCY: The script creates five new columns. For each row of data, the field evaluates to True if the field is not null (i.e., if the donor has giving in the year being referenced). In Python, ‘True’ is the same as 1, and ‘False’ is the same as zero. If the donor had giving in the most recent year, the first new column will evaluate to True, and when I multiply ‘True’ by 5, the answer is 5. If no giving, the result is zero, and multiplying by 5 equals zero. The same goes for the next column: If the donor had giving in the year before, the result is True, and when I multiply by 4, the answer is 4. And so on, down to five years ago (‘1’). Then I create a sixth new column, which is populated by the maximum value found in the previous five columns — this ends up being the Recency portion of the score. Finally, I delete (or “drop”) the previous five columns, as they are no longer needed.

FREQUENCY: This part is way simpler … just a count of the number of non-null values in the five columns of yearly giving totals.

MONETARY VALUE: First, we sum on giving for all five years. Then these totals are chopped up into quintiles, ignoring any null totals. (Quintiles means five equal-sized groups, each making up 20% of the total number of records.) Each quintile is given a label (1 to 5), with the top quintile being a “5”.

That’s it. The results are saved to a .csv file, which looks like this:

output

 

 

# import required code modules.
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle   # for querying an Oracle database
import os  # for reading and writing files to your hard drive

# Prompt user for database username and password.
username = raw_input('Enter username: ')
password = raw_input('Enter password: ')

# Pass text of SQL statement to variable called SQL, enclosed in triple quotes.
# (Note the pivot ... five years of giving data.)

SQL = '''
SELECT *

FROM (
 SELECT TABLE_A.ID,
 SUM ( TABLE_B.GIFT_AMT ) AS GIVING,
 TABLE_B.GIFT_FISC_CODE AS FY

 FROM
 TABLE_A,
 TABLE_B

 WHERE
 (TABLE_A.ID = TABLE_B.ID
  AND TABLE_B.GIFT_FISC_CODE IN (2010, 2011, 2012, 2013, 2014))

 GROUP BY
 TABLE_A.ID,
 TABLE_B.GIFT_FISC_CODE
 )

PIVOT (
  SUM ( GIVING ) AS total FOR ( FY )
  IN (2010 as FY2010, 2011 as FY2011, 2012 as FY2012, 2013 as FY2013, 2014 as FY2014)
  )

'''

# Connect, execute the SQL, and put data into a pandas DataFrame (df)
# "@dsn" (data source name) is the TNS entry (from the Oracle names server or tnsnames.ora file)
# "0000" is the port number

connectstr = username + '/' + password + '@dsn:0000'
connection = cx_Oracle.connect(connectstr)
df = psql.frame_query(SQL, con=connection) 

## RFM score

# RECENCY
# Create five new columns.
df['Recency5'] = df.FY2014_TOTAL.notnull() * 5
df['Recency4'] = df.FY2013_TOTAL.notnull() * 4
df['Recency3'] = df.FY2012_TOTAL.notnull() * 3
df['Recency2'] = df.FY2011_TOTAL.notnull() * 2
df['Recency1'] = df.FY2010_TOTAL.notnull() * 1

# Create a sixth column, setting it to the max value of the previous five columns.
df['Recency'] = df[['Recency5', 'Recency4', 'Recency3', 'Recency2', 'Recency1']].max(axis=1)

# Delete the five original columns - no longer needed.
df = df.drop(['Recency5', 'Recency4', 'Recency3', 'Recency2', 'Recency1'], axis=1)

# FREQUENCY

# Create new column, and set it to the number of non-null values in the giving totals columns.
df['Frequency'] = df[['FY2010_TOTAL', 'FY2011_TOTAL', 'FY2012_TOTAL', 'FY2013_TOTAL', 'FY2014_TOTAL']].count(axis=1)

# MONETARY VALUE

# Create new column, the sum of all giving years.
df['Giving_Total'] = df[['FY2010_TOTAL', 'FY2011_TOTAL', 'FY2012_TOTAL', 'FY2013_TOTAL', 'FY2014_TOTAL']].sum(axis=1, skipna=True)

# Break all giving totals into five quintiles (ignoring any null values), and store the label (1 to 5) in another new column.
quintile_bins = [1, 2, 3, 4, 5]
df['Monetary'] = pd.qcut(df['Giving_Total'], 5, labels = quintile_bins)

# Write results to a .csv file, and display number of saved records to user.
df.to_csv('RFM_results.csv')
print(str(len(df)) + " records output")

27 January 2011

RFM: Not a substitute for predictive modeling

Filed under: RFM, Why predictive modeling? — Tags: , — kevinmacdonell @ 9:06 am

Recency, Frequency, Monetary value. The calculation of scores based on these three transactional variables has a lot of sway over the minds of fundraisers, and I just don’t understand why.

It’s one of those concepts that never seems to go away. Everyone wants a simple way to whip up an RFM score. Yet anyone who can do a good job of RFM is probably capable of doing real predictive modeling. The persistence of RFM seems to rest on some misconceptions, which I want to address today.

First, people are under the impression that RFM is cutting-edge. It isn’t. In his book, “Fundraising Analytics: Using Data to Guide Strategy,” Joshua Birkholz points out that RFM is “one of the most common measures of customer value in corporate America.” It’s been around a long time. That alone doesn’t mean it’s invalid — it just isn’t anything new, even in fundraising.

Second, it’s often misconstrued as a predictive tool, and therefore the best way to segment a prospect pool. It’s not. As Josh makes clear in his book, RFM has always been a measure of customer value. It does not take personal affinities into account, nor any non-purchasing activities, he writes.

Note the language. RFM is borrowed from the for-profit world: retail and sales. Again, this doesn’t discredit it, but it does make it inappropriate as the sole or primary tool for prediction. Because it’s purely transactional in nature, all RFM can tell you is that donors will become donors. It CAN’T tell you which non-donors are most likely to be acquired as new donors.  The RFM score for a non-donor is always ZERO.

It also can’t tell you which lapsed donors are most likely to be reactivated, or which donors are most likely to be upgraded to higher levels of giving. In the eyes of RFM, one person who gave $50 last year is exactly the same as any other person who gave $50 last year. They’re NOT.

Third, we’re often told that RFM is easy to do. RFM is easy to explain and understand. It’s not necessarily easy to do. Recency and Monetary Value are straightforward, but Frequency requires a number of steps and calculations, and you’re probably not going to do it in Excel. Josh himself says it’s easy to do, but the demonstration in his book requires SPSS. If you’re using a statistics software package such as SPSS and you’ve mastered Frequency, then true predictive modeling is easily within your grasp. Almost all the variables I use in my models are simpler to derive than Frequency.

Is RFM useless? No, but we need to learn not to pick up a hammer when what we really need is a saw. RFM is for ranking existing donors according to their value to your organization, based on their past history of giving. Predictive modeling is for predicting (who knew?), and answering the three hard questions I listed above (acquisition, reactivation, upgrade potential.)

You could, in fact, use both. Your predictive model might identify the top 10% of your donor constituency who are most likely to renew, while your RFM score set will inform you who in that top 10% have the highest potential value to your organization. A matrix with affinity (predictive model) on one axis and value on the other (RFM) would make a powerful tool for, say, segmenting an Annual Giving donor pool for Leadership giving potential. Just focus on the quadrant of donors who have high scores for both affinity and value.

If you want to use RFM in that way (that is, properly), then fill your boots. I recommend Josh Birkholz’s book, because he lays it out very clearly.

The real danger in RFM is that it can become an excuse for not collecting more and better data.

For any institution of higher education, the idea that RFM is the bee’s knees is patently untrue. Institutions with alumni have a trove of variables that are informative about ALL of their constituency, not just those who happen to be donors. Expand that to arts-based nonprofits, and you’ll find member-based constituencies and the very same opportunities to model in a donor/non-donor environment. Neither of these types of institutions should be encouraged to rely exclusively on RFM.

For the rest, who don’t have the data for their constituency but could, the idea that pure donor transaction data is all you need cuts off the chance of doing the work now to get more sophisticated about collecting and organizing data that will pay off in the years ahead.

Create a free website or blog at WordPress.com.