# CoolData blog

## 5 August 2010

### Perception vs. reality on the Number 80 bus

Filed under: Planned Giving, Statistics — kevinmacdonell @ 11:26 am

(Photo used under Creative Commons license. Click image for source.)

Do you ride the bus back and forth to work? I do. Some days it’s a quick trip, and other days it just goes on forever. There’s this one stop where the driver will park the bus and just sit there, as the minutes tick by. How dare she. Doesn’t she know I’m in a hurry?

I have some flexibility in office hours, at least during the summer, so I set out to pick the best times to travel. I wanted to know: Which buses on the Number 80 route were most catchable (i.e., had a very predictable time of arrival at the stop closest to my house), were fastest and most reliable (i.e., exhibited the least variability in travel times) and were least full (so I wouldn’t have to stand the whole way).

I was sure that there was some optimal combination of these three, but I couldn’t figure it out just by riding the bus. There didn’t seem to be any discernable pattern to my experience. I did not believe it was random, so there was one conclusion: It’s a data problem.

So I’ve been collecting data on my bus rides, and I’ve just had a look at it. What I found out had less to do with the bus route than with the nature of perceived reality. What you think is going on isn’t necessarily what’s actually happening. (And yes, I’ll bring this back to fundraising.)

I record the time I sit down, and the time I land on the sidewalk at my destination. I note the day of the week (maybe Mondays are quicker rides than Fridays) and the month (maybe buses are less full during the summer months when people are on vacation). I also note how full the bus is (on a scale of 1 to 5), and whether I have to stand (0/1). And finally, I make note of outliers due to “disruptive events” (unusually long construction delays, mechanical failure, etc.)

No one but a geek would do this. But it takes only a few seconds — and if you’re interested in statistics, collecting your own data can be instructive in itself.

I haven’t collected enough data points on the Number 80 bus to reveal all its secrets, but I learned enough to know that I have no sense of elapsed time. Leaving out one extreme outlier, my average trip duration (in either direction) is 38 minutes. So how much do individual trips vary from 38 minutes? Well, 79% of all trips vary from the average by three minutes or less. Three whole minutes! Allow just one more minute of variance, and 90% of trips fit in that window.

All other patterns related to duration are pretty subtle: Late-morning rush hour buses, and the 4:45 p.m. bus tend to have the largest variance from the mean, the first because it’s a quicker trip, the second because it’s longer. The trip home is longer than the morning commute by only about one minute, on average. Tuesdays tend to bring slightly longer trips than any other day of the week — Tuesdays also have the highest average “fullness factor”.

But really, I can hop on any Number 80 bus and expect to get to my destination in 38 minutes, give or take a couple of minutes. That’s a far cry from how I perceive my commuting time: Some quick rides, some unbearably long ones. In fact, they’re all about the same. The bus driver is not trying to drive me crazy by parking the bus in mid-trip; she’s ahead of schedule and needs to readjust so commuters farther down the line don’t miss their bus.

If we can get simple things wrong, think of all the other assumptions we make about complex stuff, assumptions that could be either confirmed or discarded via a little intelligent measuring and analysis. According to what people widely believe about Planned Giving, you can go into your database right now and skim off the top alumni by years of giving and frequency of giving, and call them your top Planned Giving prospects. Your consistent donors are your best prospects, right?

Not necessarily. In fact, in one school’s data, I determined that if all their current, known Planned Giving expectancies were hidden in the database like needles in a haystack, and one were only allowed to use these patterns of past giving to find them again, they would miss two-thirds of them!

We are not wrong to have beliefs about how stuff works, but we are wrong in clinging to beliefs when the answers are waiting there in the data. The point is not that past giving is or isn’t a determinant of Planned Giving potential for your institution — the point is that you can find that out.

## 1 April 2010

### Does “no children” really mean Planned Giving potential?

Filed under: Planned Giving, Predictor variables, Surveying — Tags: , , , — kevinmacdonell @ 11:29 am

I gave a presentation to fundraising professionals and other nonprofit types recently, and I spent a little time discussing my work with predicting Planned Giving potential. One of the attendees asked if I was aware of a recent study that found that the most significant predictor for Planned Giving was the absence of children.

I had, and in my (not very coherent) response I said something to the effect that although this was interesting, I had reservations about taking an observation based on other institutions’ populations and applying it to ours. I would prefer to test it, I said. (I believe that someone else’s valid observation about their own data is only an assumption when applied blindly to mine.) And then I said that we don’t have the data to begin with.

But as I was talking, a thought occurred to me: Yes, in fact we DO have child data! I had even used that data in my PG model, but it had never occurred to me to study it very closely.

Back in the spring of 2009, our school conducted an extensive online survey of alumni as part of a national benchmarking study of alumni engagement. One of the core questions (supplied by the study firm, Engagement Analysis Inc.) asked specifically about likelihood to consider a bequest. Another question, which we added ourselves, asked respondents how many children they had under the age of 18. (We had a purpose in asking about “under 18”, and it wasn’t Planned Giving. Had I specifically been seeking a PG predictor, I would not have qualified the statement. Presumably the positive “childless effect” is explained by the lack of need to divide an estate up among children, regardless of their age.)

Our response rate was very high, and quite representative of our alumni population. Standing there in the midst of my presentation, I realized I had enough information to test the ‘childless’ theory in the environment of our own data.

The chart below shows survey responses to the PG question on the horizontal axis. The question was actually a scale statement which indicated that the responder was very likely to leave a bequest to our institition. Possible answers ranged from 1 to 6, with a one meaning “strongly disagree” and a six meaning “strongly agree”. If the respondent did not answer the question, I coded it as zero so it would show up on my chart.

In the chart, each group of respondents (i.e., each vertical bar) is segmented according to their answer on the “children” question. Notice the relative size of the blue segments, the responders who have no children under 18. For the proportion of this segment, there is a difference of approximately ten percentage points between the “strongly agree” group and the “strongly disagree” group.

In other words, childless alumni in our survey data set ARE more receptive to considering Planned Giving.

I said earlier that the survey response was representative of our alumni population. Therefore, many of the responders are far too young to be considered prospects. So I made another chart, which shows only alumni in the older half of the population: Class year 1990 and earlier. The difference between these two charts will seem subtle because they’re busy-looking, so let me point it out to you: Now the gap between the “strongly disagree” and the “strongly agree” for people with no kids has widened to 15 percentage points. This is a vote of confidence in favour of using “number of children” as a predictor of PG receptivity.

But here’s a question: Can you use child data to segment your prospect pool, and thereby avoid having to engage in predictive modeling? My answer is “No.” In both of the charts above, a majority of respondents answered “no children”, regardless of their attitude to Planned Giving. Yes, there’s a difference among the groups, but although it is significant, it is not definitive.

Others may quibble, saying that the data is suspect because we only asked about children under 18. But I really think this predictor is a lot like certain other conventional predictors, the ones related to frequency and consistency of giving: Alone, they are not powerful enough to isolate your best PG prospects. Only when you combine them with the full universe of other proven predictors in your database (event attendance, marital status, etc.) will you end up with something truly useful.

## 18 March 2010

### My Planned Giving model growing pains

Filed under: Model building, Planned Giving, regression — Tags: , , — kevinmacdonell @ 8:22 am

People stumbling on CoolData might assume that I think I’ve gathered unto myself some great corpus of data mining knowledge and that now I presume to dispense it via this blog, nugget by nugget.

Uh, well – not quite.

The reality is that I spend a lot of my time at work and at home surrounded by my books, struggling to get my arms around the concepts, and doing a good deal of head-scratching. Progress is slow, as only about ten percent of my work hours are actually spent on data mining. Questions from CoolData readers are cause for anxiety more than anything else. (Questions are welcome, of course, but sometimes advice would be better.)

As a consequence, I proceed with caution when it comes to building models for my institution. I don’t have a great deal of time for testing and tweaking, and I steer clear of creating predictive score sets that cannot be deployed with a high level of confidence.

This caution has not prevented me from having some doubts about the model I created last year for our Planned Giving program, however.

This model sorted all of our alumni over a certain age into percentile ranks according to their propensity to engage with our institution in a planned giving agreement. Our Planned Giving Officer is currently focused on the individuals in the 97th percentile and up. Naturally, whenever a new commitment (verbal or written) comes across the transom (unsolicited, as I think PG gifts often are), the first thing I do is check the individual’s percentile score.

A majority of the new expectancies are in the 90s, which is good, and most of those are 97 and up, which is better. When I look at the Annual Giving model scores for these same individuals, however, I see that the AG scores do a better job of predicting the Planned Giving donors than the PG scores do. That strikes me as a bit odd.

Planned Giving being a slowly-evolving process, there aren’t enough examples of new commitments to properly evaluate the model, to my satisfaction at least. But when model-building time comes around again in July and August, I’ll be making some changes.

The central issue I faced was that current commitments numbered only a little over 100. That’s not a lot of historical data to model on. I asked around for advice. One key piece of advice was to cut down on the size of the prospect pool by excluding all alumni younger than our youngest current commitment. Done.

My primary interest, though, was to somehow legitimately boost the number of examples of PG donors, in order to beef up the dependent variable in a regression analysis.

Some institutions, I learned, tried to do this by digging into data on deceased planned giving donors, going back five or ten years. (I hope I do not strain decorum with the verb I’ve selected.) Normally we model only on living individuals, but having access to more examples of this type of donor has proven helpful for some. Unfortunately, on investigation I found that the technical issues involved made it prohibitively time-consuming: For various reasons, I would have had to perform many separate queries of the database in order to get at this data and merge it with that of the living population.

As luck would have it, though, around this time we received all the data from a huge, wide-ranging survey of alumni engagement we had conducted that March. One of the scale statements was specifically focused on attitudes towards leaving a bequest to our institution. The survey was non-anonymous, and a lot of positive responders to this statement were in our target age range. Bingo – I had a whole new group of “PG-oriented” individuals to add to my dependent variable. The PG model would be trained not only on current commitments, but on alumni who claimed to be receptive to the idea of planned giving.

In addition, I had the identities of a number of alumni who had attended information sessions on estate planning organized by our Planned Giving Officer.

I think all was well up to that point. What I did after that may have led to trouble.

I thought to myself, these PG-oriented people are not all of the same “value”. Surely a written gift commitment is “worth more” than a mere online survey response clicked on in haste. So I structured my dependent variable to look like this, using completely subjective ideas of what “value” ought to be assigned to each type of person:

• Answered “agree” to the PG statement in survey: 1 point
• Answered “strongly agree” to the PG statement in survey: 2 points
• Attended an estate planning session: 3 points
• Has made a verbal PG commitment: 6 points
• Has a written commitment in place: 8 points

Everyone else in the database was assigned a zero. And then I used multiple regression to create the model.

This summer, I think I will tone down the cleverness with my DV.

First of all, everyone with a pro-PG orientation (if I can put it that way) will be coded “1”. Everyone else will be coded “0”, and I will try using logistic regression instead of multiple regression, as more appropriate for a binary DV.

Going back to the original model, it occurs to me that my method was based on a general misconception of what I was up to. In creating these “levels of desirability,” I ignored the role of the Planned Giving Officer. My job, as I see it now, is to deliver up the segment of alumni that has the highest probability of receptivity to planned giving. It’s the PGO’s task to engage with the merely interested and elevate them to verbal, then written, agreements. In that sense, the survey-responder and the final written commitment could very well be equivalent in “value”.

The point is, it’s not in my power to make that evaluation. Therefore, this year, everyone with the earmarks of planned giving about them will get the same value: 1. I hope that results in a more statistically defensible method.

(I should add here that although I recognize my model could be improved, I remain convinced that even a flawed predictive model is superior to any assumption-based segmentation strategy. I’ve flogged that dead horse elsewhere.)

A majority of the new expectancies are in the 90s, and most of those are 97 and up. However, you’ll see in the attached that I compare the effectiveness of the PG score with that of the Annual Giving score. It would seem that the AG score does a better job of picking the Planned Giving donors than the PG score does! Even the old “general” model from 2008 does a (slightly) better job.

That’s a bit odd. The first thing I would say is that 11 is a very small sample and it’s hard to generalize from that.

## 17 February 2010

### Is ‘overfitting’ really a problem?

Filed under: Model building, Pitfalls, Planned Giving, Predictor variables — Tags: , , — kevinmacdonell @ 8:06 am

(Used via Creative Commons license. Click image for source.)

Overfitting describes a condition where your data fits a model “too well”. Your model describes your sample nearly perfectly, but is too rigid to fit any other sample. It isn’t loose enough to serve your predictive needs.

Is this something you ought to worry about? My response is a qualified ‘no’.

First, if your sample is very large, in the many thousands of records, and you’re modeling for a behaviour which is not historically rare (giving to the Annual Fund, for example), then overfit just isn’t an issue. Overfit is something to watch for when you’ve got small sample sizes or your data is limited in some way: building a Planned Giving or Major Giving model based on only a handful of existing cases of the desired behaviour, for example.

Overfit has always sounded like a theoretical problem to me, something that bothers analysts working at some rarefied higher level of modeling refinement. My goal has always been to improve on existing segmenting practices; if the bar is set at “throwing darts at the board,” one is going to be happy with the results of a predictive model, even if it’s wearing a too-restrictive corset.

And yet … doubts crept in.

While creating a model for Planned Giving potential I discovered a characteristic prevalent among our existing expectancies which gave me pause. Many of our existing commitments are from clergy, a number of whom live in retirement on campus. This results from our institution’s history and its traditional association with the Roman Catholic Church. Not surpringly, a name prefix identifying clergy turned out to be a highly predictive variable. Using the variable in the model would have boosted the fit – but at what cost?

Here’s the problem. Elderly clergy members may be the model for past and current expectancies, but I was not confident that the Planned Giving donors of the future would resemble them. Societal changes resulting in a growing distance between church and university was one of the reasons leading me to think that using this variable would be a mistake – this model needed more leeway than that. It took a while for me to make the connection between this gut feeling and the rather abstract concept of ‘overfit’.

This, then, is my advice: Forget about the theory and use common sense – are any of your predictor variables likely to do a much better job describing the reality of the past than that of the future? Don’t overthink it: If your gut’s mostly okay with it, then don’t worry about it. Otherwise, consider sacrificing a little R-squared to get a better model.

## 5 February 2010

### Rare-event modeling: Terrorists and planned giving

Filed under: Model building, Planned Giving — Tags: , , — kevinmacdonell @ 3:11 pm

In January, the White House released a review of the incident in which a would-be bomber nearly destroyed a passenger jet in flight on Christmas Day. Why did anti-terrorism officials fail to identify and counter this threat? According to the report, part of the problem was in the databases, and in the data-mining software: “Information technology within the CT [counterterrorism] community did not sufficiently enable the correlation of data that would have enabled analysts to highlight the relevant threat information.”

I’ve just finished reading Stephen Baker’s book, The Numerati, published in 2008. In a chapter called simply “Terrorist”, he observes that it’s nearly impossible to build a predictive model of “rare or unprecedented events,” citing the few cataclysmic examples that we all know about. “This is because math-based predictions rely on patterns of past behaviour,” he writes.

Known and suspected terrorists are presumably the needle in a huge haystack that includes you, me, and everyone else in the world. Terrorists are practically invisible in such a sea of identities, they work hard at avoiding detection, and they trigger events that may never have happened before.

Not to trivialize the subject, but while reading this it struck me that some of the models we build in the more prosaic world of fundraising are in the related business of modeling for rare events. I’m thinking primarily of Major Gifts and Planned Giving. As tricky as this sort of prediction is, we can be thankful for three things: The events we are trying to predict are rare but not unprecedented, the data set has precise limits, and the stakes are not nearly as high.

Here is a basic tip for improving the power of a Planned Giving model. My first attempt at a PG model included the full data set of alumni, from the oldest alum right up to the Class of 2009. We had a limited number of people in the database indentified as existing PG commitments, and they were swimming in that ocean of data. I took a number of steps to improve the model, but the most obvious was to exclude all the younger alumni. They would not normally be considered PG prospects, and eliminating them boosted the ratio of PG commitments to the general population.

Look at your existing commitments, identify who the youngest is (by class year, probably), and exclude all the alumni who are younger than that. (Use a selector variable in Data Desk right in your regression table, if that’s what you’re doing.) If your youngest is an outlier, then pick the second-youngest as your cutoff – but don’t eliminate the outlier individual, because you need all the historical data you can get!

## 11 January 2010

### The 15 top predictors for Planned Giving – Part 3

Okay, time to deliver on my promise to divulge the top 15 predictor variables for propensity to enter a Planned Giving commitment.

Recall the caveat about predictors that I gave for Annual Giving: These variables are specific to the model I created for our institution. Your most powerful predictors will differ. Try to extract these variables from your database for testing, by all means, but don’t limit yourself to what you see here.

In Part 2, I talked about a couple of variables based on patterns of giving. The field of potential variables available in giving history is rich. Keep in mind, however, that these variables will be strongly correlated with each other. If you’re using a simple-score method (adding 1 to an individual’s score for each positively-correlated predictor variable), be careful about using too many of them and exaggerating the importance of past giving. On the other hand, if you use a multiple regression analysis, these related variables will interact with each other – this is fine, but be aware that some of your hard-won variables may be reduced to complete insignificance.

Just another reason to look beyond giving history!

For this year’s Planned Giving propensity model, the predicted value (‘Y’) was a 0/1 binary value: “1” for our existing commitments, “0” for everyone else. (Actually, it was more complicated than that, but I will explain why some other time.)

The population was composed of all living alumni Class of 1990 and older.

## The list

The most predictive variables (roughly in order of influence) are listed below. Variables that have a negative correlation are noted N. Note that very few of these variables can be considered continuous (eg. Class Year) or ordinal (survey scale responses). Most are binary (0/1). But ALL are numeric, as required for regression.

2. Number of Homecomings attended
3. Response to alumni survey scale question, regarding event attendance
4. Number of President’s Receptions attended
5. Class Year (N)
6. Recency: Gave in the past 3 years
7. Holds another degree from another university (from survey)
8. Marital status ‘married’
9. Prefix is Religious (Rev., etc.) or Justice
10. Alumni Survey Engagement score
12. Number of children under 18 (from survey) (N)

Like my list of Annual Giving predictors, this isn’t a full list (and it isn’t 15 either!). These are the most significant predictors which don’t require a lot of explanation.

Note how few of these variables are based on giving – ‘Years of giving’ and ‘Frequency of giving’ don’t even rate. (‘Lifetime giving’ seems to take care of most of the correlation between giving and Planned Giving commitment.) And note how many variables don’t even come from our database: They come from our participation in a national survey for benchmarking of alumni engagement (conducted in March 2009).