CoolData blog

5 July 2016

A simple score you can probably build in Excel

Filed under: Excel, Peter Wylie, Predictive scores — Tags: , , , — kevinmacdonell @ 4:22 pm

Guest post by Peter B. Wylie


In the evolving world of analysis for higher ed and non-profits, it’s apparent that a gap is widening: Many well-resourced shops are acquiring analytics talent comfortable with statistics and programming, but many others are unable to make investments in specialized talent.


Today’s guest post is a paper by Peter Wylie that addresses the latter group, the ones at risk of being left behind. Download his paper here: Simple_Score_in_Excel_Wylie


In this piece he uses data from two schools to show you something you can try with your own data, building a very simple predictive score using nothing but Excel.


Some level of data analysis ought to be accessible at some level to every organization, regardless of technical proficiency or tools. And in fact, shops that move too quickly to automate predictive scoring with black-box-like methods risk passing over the insights available to the exploratory analyst using more manual, time-consuming methods.


We hope you enjoy, and above all, that you try this with your own data. The download link again: Simple_Score_in_Excel_Wylie


13 June 2016

Nifty SQL regression to calculate donors’ giving trends

Filed under: Coolness, Predictor variables, regression, SQL — Tags: , , , — kevinmacdonell @ 8:28 pm


Here’s a nifty bit of SQL that calculates a best-fit line through a donor’s years of cash-in giving by fiscal year (ignoring years with no giving), and classifies that donor in terms of how steeply they are “rising” or “falling”.


I’ll show you the sample code, which you will obviously have to modify for your own database, and then talk a little bit about how I tested it. (I know this works in Oracle version 11g. Not sure about earlier versions, or other database systems.)


with sums AS (
 select, t1.fiscal_year, log(10, sum(t1.amount)) AS yr_sum
 from gifts t1
 group by, t1.fiscal_year),

slopes AS (
 select distinct,
 regr_slope(sums.yr_sum,sums.fiscal_year) OVER (partition by AS slope

from sums

 when slopes.slope is null then 'Null'
 when slopes.slope >=0.1 then 'Steeply Rising'
 when slopes.slope >=0.05 then 'Moderately Rising'
 when slopes.slope >=0.01 then 'Slightly Rising'
 when slopes.slope >-0.01 then 'Flat'
 when slopes.slope >-0.05 then 'Slightly Falling'
 when slopes.slope >-0.1 then 'Moderately Falling'
 else 'Steeply Falling' end AS description

from slopes
That’s it. Not a lot of SQL, and it runs very quickly (for me). But does it actually tell us anything?


I devised a simple test. Adapting this query, I calculated the “slope of giving” for all donors over a five-year period in the past: FY 2007 to FY 2011. I wanted to see if this slope could predict whether, and by how much, a donor’s giving would rise or fall in the next five-year period: FY 2012 to FY 2016. (Note that the sum of a donor’s giving in each year is log-transformed, in order to better handle outlier donors with very large giving totals.)


I assembled a data file with each donor’s sum of cash giving for the first five-year period, the slope of their giving in that period, and the sum of their cash giving for the five-year period after that.


The first test was to see how the categories of slope, from Steeply Rising to Steeply Falling, translated into subsequent rises and falls. In Data Desk, I compared the two five-year periods. If the second period’s giving was greater than the first, I called that a “rise.” If it was less, I called it a “fall.” And if it was exactly the same, I called it “Same.”


The table below summarizes the results. Note that these numbers are all percentages, summed horizontally. (I will explain the colour highlighting later on.)




For Steeply Rising, 60.6% of donors actually FELL from the first period to the next. Only 37.8 percent rose, and just 1.6% stayed exactly the same. Not terribly impressive. Look at Steeply Falling, though: More than three-quarters actually did fall. That’s a better result, but then again, “Falling” dominates for every category; in the whole file, close to 70% of all donors reduced their giving in the next period. If a donor has no giving in the second period of five years, that’s zero dollars given, and this is called a “Fall” — more on that aspect in just a sec.


(I’ve left out donors with a FY2007-11 slope of Null — they’re the ones who gave in only one year and therefore don’t have a “slope”.)


Let’s not give up just yet, however. The colour highlighting indicates how high each percentage value is in relation to those above and below it. For example, the highest percentages in the Falling column are found in the Slightly, Moderately, and especially Steeply Falling slope categories. The highest percentages in the Rising column are in the Slightly, Moderately, and Steeply Rising slope categories. And in the Same column, the Flat slope wins hands-down — as we would hope.


So a rising slope “sort of” predicts increased giving, a falling slope “sort of” predicts decreased giving. Unfortunately, many donors are not retained into the second five-year period, so there’s not a lot to be confident about.


But what if a donor IS retained? What if we exclude the lapsed donors entirely? Let’s do that:




Excluding non-donors seems to lead to an improvement … The slope does a better job sorting between the risers and fallers when a donor is actually retained. Again, the colour highlighting is referencing columns, not rows. But notice now that, across the rows, Rising has a slight majority for the Rising slope categories, and Falling has a slight majority for the Falling slope categories. (The bar is set too high for Flat, however, given that a donor’s giving in the first five years has to be exactly equal to her giving in the second five years to be called Same.)


Admittedly, these majorities are not generous. If I calculated a donor’s slope of giving as Steeply Rising and that donor was retained, I have only a 56.4% chance of actually being right. And of course there’s no guarantee that donor won’t lapse.


(Note that these are donors of all types — alumni, non-alumni individuals, and entities such as corporations and foundations. Non-alumni donors tend not to have patterns in their giving that are repeated, not to the extent that alumni do. However, when I limit the data file to alumni donors only, the improvement in this method is only very slight.)


Pressing on … I did a regression analysis using total giving in the second five-year period as the dependent variable, then entered total giving in the prior five-year period as an independent variable. (Naturally, R-squared was very high.) This allowed me to see if Slope provides any explanatory power when it is added as the second independent variable — the effect of giving in the first five-year period already being accounted for.


And the answer is, yes, it does. But only under specific conditions: Both five-year giving totals were log-transformed and, most significantly, donors who did not give in the second period were excluded from the regression.


There are other way to assess the usefulness of “slope” which might lead to an application, and I encourage you to give this a try with your own data. From past experience I know that donors who make big upgrades in giving don’t have any neat universal pattern such as an upward slope in their giving history. (The concept of volatility is explored here and here.) “Slope” is probably too simple a characteristic to employ on its own.


But as I’ve said before, if it were easy, obvious, or intuitive, it wouldn’t be data analysis.


30 May 2016

Donor volatility: testing years of non-giving as a predictor for the next big gift

Filed under: Annual Giving, Coolness — Tags: , , , , — kevinmacdonell @ 5:02 am

Guest post by Jessica Kostuck, Data Analyst, Annual Giving, Queen’s University


During my first few weeks on the job, my AD set me up on several calls with colleagues in similar, data-driven roles, at universities across the country. One such call was with Kevin MacDonell, keeper of CoolData, with whom I had a delightfully geeked-out conversation about predictive modeling. We ran the gamut of weird and wonderful data points, ending on the concept of donor volatility.


When a lapsed high-end donor has no discernable annual giving pattern, is it possible to use his or her years of non-giving to predict and influence their next big gift?


Our goal for our Annual Giving program was to identify these “volatile” donors (lapsed high-end donors with an erratic giving history), and reactivate (ideally, upgrade) them, through a targeted solicitation with an aggressive ask string.


(For more on volatility, see Odd but true findings? Upgrading annual donors are “erratic” and “volatile”, which describes findings that suggest the best prospects for a big upgrade in giving are those who are “erratic”, i.e. have prior giving but are not loyal, every-year donors, and “volatile”, i.e. are inconsistent about the amounts they give.)


I did some stock market research (see footnote), decided on a minimum value for the entry-point into our volatility matrix ($500), and, together with Senior Programmer Analyst, Kim Wilkinson, got cracking on writing a program to identify volatile donors.


volatile sql clip



Our ideal volatile donors had given ≥ $500 at least twice in the last 10 years, without any consecutive (“stable”) periods. Year over year, our ideal volatile donor would act in one of three ways – increase their giving by at least 60%, decrease their giving by at least 60%, or not give at all. Given the capacity level displayed by these volatile donors, we replaced years of very low-end giving <$99) with null values (“throwaway gifts”).


We had strict conditions for what would remove a donor from our table. If a donor had two years of consecutive giving within a ±60% differential from their previous highest giving point (v_value), we considered this a natural (or, at least, for this test, not sufficiently irregular) fluctuation in giving, and they were removed from the table. If the donor had two consecutive years of low-end (but not null) giving ($99-$499), this was considered a deliberate decrease, and they, too, were removed. Conversely, if a donor had two consecutive years of greatly increased giving, this was considered a deliberate increase, and they were also removed.


At any point, a donor could be admitted, or readmitted into our volatility matrix, by establishing, or re-establishing, a v_value and subsequent valid volatility point.


The difference between a lapsed donor and a volatile donor


Below is a sample pool of donors we examined.


volatile donor history image


Donor 1 is volatile all the way through, with greatly varying levels of giving, culminating in two years of non-giving. Donor 1 is currently volatile, and thus enters our test group.


Donor 2 is volatile for two years – FY07-08 and FY08-09 (v_value of $5,000 in FY07-08, followed by a valid volatile point in FY08-09 with a decrease of -80%), but then is removed from the table in FY09-10 with only a -50% decrease in giving. They do not establish a new v_value, even though their FY09-10 giving meets the minimum giving threshold for this test, because of their consecutive, only marginally decreased giving in FY10-11. This excludes Donor 2 from our test.


Donor 3 enters our volatility matrix in FY04-05, leaves in FY07-08, reenters in FY10-11, and maintains volatility to current day, and, thus, enters into our test solicitation.


While all three of these donors are lapsed, and are all SYBUNTs, only Donor 1 and Donor 3 are, by our definition, volatile.


Solicitation strategy and results


We now had a pool of constituents who were at least two years lapsed in giving, who all had a history of inconsistent, but not unsubstantial, contributions to the university. In an email solicitation, we presented constituents with both upgrade language and an aggressive ask matrix, beginning at a minimum of +60% of their highest ever v_value, regardless of where they were in the ebb and flow of their volatility cycle. Again, the goal of this test was to (1) identify donors with high capacity (2) whose giving to the university was erratic in frequency and loyalty and (3) encourage these donors to reactivate at greater than their previously-established high-end giving.


In our results analysis, we broadened our examination to include any gifts received from our testing pool within the subsequent four weeks, not just gifts linked to this particular solicitation code, to verify the legitimacy of tagging these donors as volatile – that is, having a higher-than-average probability to reactivate at a high-end giving level.


An important part of our analysis included comparing our testing pool to a control pool, pairing each of our volatile donors with a non-volatile twin who shared as many points of fiscal and biographic information as was possible.


Within the four-week time frame, our test group had about a 7% activity rate, whereas our control group had an activity rate of about 5% (average for the institution during this timeframe). Within our volatility test group, 50% of donors gave an amount that would plot a valid point on our volatility matrix.


Conclusion and next steps


Through our experiment, we sought to identify volatile donors, and test if we could trigger a reactivation in giving, ideally at, or greater than, their highest level on record.


Since not all of the donors within our test group made their gifts to the coded solicitation with the volatile ask matrix, it is indiscernible whether being presented with language and ask amounts that reflected their elusive giving behavior prompted a gift – volatile or otherwise. However, we do feel confident that we’re onto something when it comes to identifying and predicting the behavior of a particular, valuable set of donors to our institution.


Our above-average response rate (both versus the control group, and institution-wide) supports our “theory of volatility”, insofar as that volatile donors are an existing pool with shared behaviors within our donor population. We plan to re-run this test again at the same time next year, continuing our search to find a pattern within the instability.


Were we able to gather definitive results that will define and shape future annual giving strategy? Not exactly. But as far as data goes, this was definitely cool.


Jessica Kostuck is the Data Analyst, Annual Giving at Queen’s University in Kingston, Ontario. She can be reached at



1. Varadi, David. “Volatility Differentials: High/Low Volatility versus Close/Close Volatility (HVL-CCV).” CSS Analytics. 29 Mar. 2011. Web. Winter 2015.

1 February 2016

Regular-season passing yardage and the NFL playoffs

Filed under: Analytics, Fun, John Sammis, Off on a tangent, Peter Wylie — Tags: , , , , — kevinmacdonell @ 7:37 pm

Guest post by Peter B. Wylie, with John Sammis


How much is regular-season passing yardage related to success in the NFL playoffs? (Click link to download .PDF: Passing yardage in the NFL.)


Peter was really interested in finding out how strong the relationship might be between an NFL team’s passing during the regular season and its performance in the playoffs. There’s been plenty of talk about this relationship, but he wanted to see for himself.


A bit of a departure for CoolData, but still all about data and analysis … hope you enjoy!


9 January 2016

“Score!” now available in e-book formats

Filed under: Score!, Uncategorized — Tags: — kevinmacdonell @ 10:39 am

2014-07-18 06.41.41


I’m pleased to note that “Score! Data-Driven Success for Your Advancement Team” is now available in e-book formats. “Score!”, by Peter B. Wylie and Kevin MacDonell, is published by CASE, the Council for Advancement and Support of Education. To order your copy, click here to enter the CASE book store, and select EPUB or Mobi/Kindle.


“Score!” has been out and selling well as a print publication for some time now. But print isn’t for everyone these days, and we’re glad our work has been chosen as one of a handful of publications to get the electronic treatment — a new initiative for CASE books.


If you’re not familiar with the book already, please click on the blue cover to the right for links to reviews!


3 January 2016

CoolData (the book) beta testers needed


UPDATE (Jan 5): 16 people have responded to my call for volunteers, so I am going to close this off now. I have been in touch with each person who has emailed me, and I will be making a final selection within a few days. Thank you to everyone who considered taking a crack at it.


Interested in being a guinea pig for my new handbook on predictive modelling? I’m looking for someone (two or three people, max) to read and work through the draft of “CoolData” (the book), to help me make it better.


What’s it about? This long subtitle says it all: “A how-to guide for predictive modelling for higher education advancement and nonprofits using multiple linear regression in Data Desk.”


The ideal beta tester is someone who:


  • has read or heard about predictive modelling and understands what it’s for, but has never done it and is keen to learn. (Statistical concepts are introduced only when and if they are needed – no prior stats knowledge is required. I’m looking for beginners, but beginners who aren’t afraid of a challenge.);
  • tends to learn independently, particularly using books and manuals to work through examples, either in addition to training or completely on one’s own;
  • does not have an IT background but has some IT support at his or her organization, and would not be afraid to learn a little SQL in order to query a database him- or herself, and
  • has a copy of Data Desk, or intends to purchase Data Desk. (Available for PC or Mac).


It’s not terribly important that you work in the higher ed or nonprofit world — any type of data will do — but the book is strictly about multiple linear regression and the stats software Data Desk. The methods outlined in the book can be extended to any software package (multiple linear regression is the same everywhere), but because the prescribed steps refer specifically to Data Desk, I need someone to actually go through the motions in that specific package.


Think of a cookbook full of recipes, and how each must be tested in real kitchens before the book can go to press. Are all the needed ingredients listed? Has the method been clearly described? Are there steps that don’t make sense? I want to know where a reader is likely to get lost so that I can fix those sections. In other words, this is about more than just zapping typos.


I might be asking a lot. You or your organization will be expected to invest some money (for the software, sales of which I do not benefit from, by the way) and your time (in working through some 200 pages).


As a return on your investment, however, you should expect to learn how to build a predictive model. You will receive a printed copy of the current draft (electronic versions are not available yet), along with a sample data file to work through the exercises. You will also receive a free copy of the final published version, with an acknowledgement of your work.


One unusual aspect of the book is that a large chunk of it is devoted to learning how to extract data from a database (using SQL), as well as cleaning it and preparing the data for analysis. This is in recognition of the fact that data preparation accounts for the majority of time spent on any analysis project. It is not mandatory that you learn to write queries in SQL yourself, but simply knowing which aspects of data preparation can be dealt with at the database query level can speed your work considerably. I’ve tried to keep the sections about data extraction as non-technical as possible, and augmented with clear examples.


For a sense of the flavour of the book, I suggest you read these excerpts carefully: Exploring associations between variables and Testing associations between two categorical variables.


Contact me at and tell me why you’re interested in taking part.




Older Posts »

The Silver is the New Black Theme. Create a free website or blog at


Get every new post delivered to your Inbox.

Join 1,258 other followers