CoolData blog

14 December 2010

Start with questions, not data

Filed under: Best practices, Pitfalls, skeptics — Tags: , — kevinmacdonell @ 11:29 am

Waiting for data to be made complete and pure before analysis will not lead to actionable insights in this lifetime. (Image used via Creative Commons license. Click image for source.)

What’s more important: Focusing on gathering a full suite of perfect, clean data and then exploring it to see what it tells you, or framing a difficult question and then going out to find the data that answers that question? I think the right choice is the latter, at least for fundraising shops new to analytics.

There is nothing wrong with pulling a bunch of donor data to audit for errors or play with in an unstructured way, without having a definite research question in mind. Exploration can help you get familiar with the data, which is never a bad thing. There is also nothing wrong with addressing imperfections and errors.

But if you’re looking to make a difference and advance the development of analytics in your organization, you should zero in on the biggest question or questions, and DO NOT wait until your data is perfect in order to do it.

That’s the substance of one of the recommendations in a research report published recently by MIT Sloan Management Review in collaboration with the IBM Institute for Business Value. (PDF: Analytics: The New Path to Value — How the Smartest Organizations Are Embedding Analytics to Transform Insights into Action.) The report is rather general, buzzword-laden, and focused on the private sector, but the observations are valid.

The first option — data-gathering and pure exploration for its own sake — is a valuable activity for shops where analytics is part of normal operations, i.e. the minority of shops. Such exploration can lead to additional insights for which you know there will be a receptive audience and, more importantly, can lead to framing the important questions of tomorrow.

For today, though, focus on your organization’s thorniest problems. Tackling the big unknowns might seem the risky way to go if you’re looking for a quick win — but imagine the response to making some headway on conundrums such as these:

  • Engagement of young alumni is a deep concern. How can we identify young alumni who graduated in the last five years who are most likely to become volunteers and ambassadors for their class?
  • Donor retention beyond a year or two isn’t what it ought to be. Which lapsed donors are most likely to be reactivated if we increase our efforts in their direction?
  • Fulfillment on pledges is abyssmal. Which pledges are most at risk of defaulting and need early attention?
  • (Insert your institution’s burning issue here.)

The second part of this message is to go ahead and use the data even if it isn’t perfect (within reason). Some people in our business cannot stomach basing decisions on partial or imperfect data. The result of their cautious fastidiousness is not perfection, it’s stasis!

I’m reminded of one of the most common objections to using contact information variables such as ‘business phone present’ as predictors: What if some of those numbers are the result of research or data appends, not a result of alumni/donor engagement? Well, sure, if you have additional information about where the data came from, then use it to split the variable to allow for separate correlation testing. But if you don’t, why should you assume that the entire project has somehow been invalidated? What a shame if analytics came to a halt based on someone’s notions about data purity.

Success in analytics is not an all-at-once deal; it’s iterative. It goes like this: “Let’s get some kind of answer or focus this year, and through that we’ll discover what the valuable data is that we need to augment, improve, or clean up for next time. Then we’ll make another, better model next year.”

25 November 2010

Turning people into numbers?

Filed under: Front-line fundraisers, skeptics — Tags: , — kevinmacdonell @ 1:18 pm

(Image used via Creative Commons license. Click image for source.)

I tend to hear the same objections from presentation audiences, my own and others’. They’re not objections so much as questions, and very good questions, and always welcome. But no one yet has voiced a reservation that I know some must be thinking: This predictive modeling stuff, it’s all so … impersonal.

We already work in a profession that refers to human beings as “prospects” and “suspects”. Doesn’t sticking scores and labels on people perpetuate a certain clinical coolness underlying how fundraising is carried out today? Predictive modeling sounds like bar-coding, profiling, and commodifying people as if they were cattle destined for the table. Maybe we can be so busy studying our numbers and charts that we lose our connection with the donor, and with our mission.

Apologies in advance for setting up a straw man argument. But sometimes I imagine I see the thought forming behind someone’s furrowed brow, and wish it would be brought into the open so we can discuss it. So here we go.

(First of all, how many fundraising offices do you know that carry out their work with “clinical coolness”? We should be so lucky!)

More seriously: Data mining and predictive modeling will never interfere with the human-to-human relationship of asking for a gift, whether it’s a student Phonathon caller seeking an annual gift, or a Planned Giving Officer discussing someone’s ultimate wishes for the fruit of a lifetime of work. It’s a data-free zone.

What predictive modeling does is help bring fundraiser and would-be donor together, by increasing the odds (sometimes dramatically increasing the odds) that the meeting of minds will successfully converge on a gift.

Here’s how I frame it when I talk about predictive modeling to an audience that knows nothing about it. If all we know about a constituent is their giving history (or lack of it), we’re treating everyone the same. Is one non-donor just as likely as another to be convinced to make an annual gift? Is one $50-a-year donor just as likely as another to respond to an appeal to double their pledge this year, or be receptive to having a conversation with a Planned Giving Officer?

The answers are No, No, and NO!

What I say is, “Everyone is an individual.” If they played sports as a student, if they lived on campus, if they attended an event — we can know these things and act accordingly, based on what they tell us about their engagement with our institution. We just have to tune in and listen.

“Everyone is an individual.” Catchy, eh? Well, it’s trite, but it’s true — and it’s no less true for data miners than it is for anyone else.

18 November 2010

Survey says … beware, beware!

Filed under: Alumni, skeptics, Surveying — Tags: , , — kevinmacdonell @ 4:45 pm

I love survey data. But sometimes we get confused about what it’s really telling us. I don’t claim to be an expert on surveying, but today I want to talk about one of the main ways I think we’re led astray. In brief: Surveys would seem to give us facts, or “the truth”. They don’t. Surveys reveal attitudes.

In higher education, surveying is of prime importance in benchmarking constituent engagement in order to identify programmatic areas that are underperforming, as well as areas that are doing well and for which making changes therefore entails risk. Making intelligent, data-driven decisions in these areas can strengthen programming, enhance engagement, and finally increase giving. And there’s no doubt that the act of responding to a survey, the engagement score that might result, and the responses to individual questions or groups of questions, are all predictive of giving. I have found so myself in my own predictive modeling at two universities.

But let’s not get carried away. Survey data can be a valuable source of predictor variables, but it’s a huge leap from making that admission to saying that survey data trumps everything.

I know of at least one vendor working in the survey world who does make that leap. This vendor believes surveying is THE singular best way to predict giving, and that survey responses have it all over the regular practice of predictive modeling using variables mined from a database. Such “archival” data provides “mere correlates” of engagement. Survey data provides the real goods.

I see the allure. Why would we put any stock in some weak correlation between the presence of an email address and giving, when we can just ask them how they feel about giving to XYZ University?


I have incorporated survey data in my own models, data that came from two wide-ranging, professionally-designed, Likert-type surveys of alumni engagement. Survey data is great because it’s fresh, independent of giving, and revealing of attitudes. It is also extremely biased in favour of highly-engaged alumni, and is completely disconnected from reality when it comes to gathering facts as opposed to attitudinal data.

Let me demonstrate the unreliability of survey data with regard to facts. Here are a few examples of statements and responses (one non-Likert), gathered from surveys of two institutions:

  • “I try to donate every year” — 946 individuals answered “agree” or “strongly agree” — but 12.3% of those 946 had no lifetime giving.
  • “I support XYZ University regularly” — 1,001 individuals answered “agree” or “strongly agree” — but 18.7% of them had no lifetime giving.
  • “Have you ever made a charitable gift to XYZ University (Y/N)?” — 1,690 individuals said “Yes” — but 8.1% of them had no lifetime giving.
  • “I support XYZ University to the best of my capacity” — 1,498 individuals answered “agree” or “strongly agree” — but 39.6% of them had no lifetime giving!

And, even stranger:

  • “I try to donate every year” — 1,371 answered “disagree” or “strongly disagree” — but 27.7% of those respondents were in fact donors!

Frankly, if I asked survey-takers how many children they have, I wouldn’t trust the answers.

This disconnect from reality actually works in my favour when I am creating predictive models, because I have some assurance that the responses to these questions is not just a proxy for ‘giving’, but rather a far more complicated thing that has to do with attitude, not facts. But in no model I’ve created has survey data (even carefully-selected survey data strongly correlated with giving) EVER been more predictive than the types of data most commonly used in predictive models — notably age/class year, the presence/absence of certain contact information, marital status, employment information, and so on.

For the purposes of identifying weaknesses or strengths in constituent engagement, survey data is king. For predicting giving in its various forms, survey data and engagement scores are just more variables to test and work into the model — nothing more, nothing less — and certainly not something magical or superior to the data that institutions already have in their databases waiting to be mined. I respect the work that people are doing to investigate causation in connection with giving. But when they criticize the work of data miners as “merely” dealing in correlation, well that I have a problem with.

6 July 2010

Pragmatism and validity: Don’t get your knickers in a knot

Filed under: skeptics — Tags: , — kevinmacdonell @ 12:05 pm

Worried about failing to reject the null hypothesis? Don't come crying to me. (Creative Commons license. Click image for source.)

Data mining is not a lab experiment — it is pragmatic. Do data miners play fast and loose with proper statistical technique? I don’t think so. No. We work in the real world, we use what works. Use the tools of your trade properly, by all means, but let’s not get bogged down in nit-picking about the validity of our methods.

A number of things set the practice of data mining apart from the statistics you read about in textbooks.

Data mining is exploratory, not experimental. Elements of statistics used in experiment design (including the null hypothesis) are not a big concern to me.

Data mining is concerned with correlation and prediction, not with correlation and causation. Everyone working with stats is concerned with correlation, but I’m not interested in the direction of dependence — whether Egg X brings about Chicken Y, or vice-versa. I know that having a single initial in the database for first or middle name is sometimes predictive of giving (even when other variables such as age are taken into account). I don’t need to know why; knowing the correlation exists is enough for me.

My bottom line is: observe the correlations, give them proper influence in the prediction via regression, and don’t worry about causation.

I also do not wring my hands over the sequence of events in time; for example, whether a prospect’s business phone number or whatever was acquired AFTER a gift was received. The worry would be understandable — how can that datum be “predictive” when it occurs after the fact? The concern is a direct result of a conventional understanding of the term “prediction,” which implies a certain order in time. The ‘predictor’ must precede the ‘predicted’, n’est-ce pas?

Not in my world, necessarily.

The conventional view has it that non-donors and donors are on opposite sides of a division in time. One day, a non-donor approaches the divide, and passes through it, magically transforming into a donor. The caterpillar changes into a butterfly. One might think therefore that only caterpillar-attributes are appropriate predictors for us to use. Butterfly-attributes, like our business phone that came to us after the gift (or because of it) would be inadmissible.

That’s not the way I see it. There is a divide, but it is not in time. The divide is between non-donors and donors, but no one crosses it. Why? Because there are already a lot of non-donors on the donor side: They are donors who haven’t given yet! To the unaided eye, they look like caterpillars, but their nature is pure butterfly.

Their butterfly-nature was created while they were a student, and nurtured during their time as an alum. It’s everything they think and feel about alma mater, their level of engagement, everything measurable and unmeasurable. Being a donor is only an outward expression of it. And as important as that expression is to us, it is not essential to the butterfly; we have to ask for it.

Donors share all sorts of characteristics, some of which we know about: reunion attendance, a tendency to provide contact information, and a dozen other things, some quite non-intuitive. When we find these same tendencies to a high degree in certain non-donors, we recognize them for what they are: Butterflies, and donors-to-be.

You won’t find me looking for a date-stamp on when we put that email address or that address update in the database, to see whether it preceded or resulted from giving. It just doesn’t matter to me.

Sure, there are exceptions. Major Giving, Planning Giving — those big events, probably unique in a donor’s lifetime, require some attention paid to the sequence of events in time, when we attempt to predict from whom the next gift is going to come.

Giving to the annual fund, however, is not so much an event as it is a state of being. I’m not saying the state doesn’t change, but it does persist.

Not everyone will see it that way. Some smart people will look askance on my equal reliance on data that is 10 years old or one day old (it’s all the same to me!). And the fact that the null hypothesis makes me yawn with disinterest. My variable distributions are non-normal. I violate assumptions left and right.

But … it works! And for me that’s where the discussion about validity ends.

28 March 2010

Get data-focused, or else?

Filed under: skeptics, Why predictive modeling? — kevinmacdonell @ 9:34 pm

I can see a day when data mining will no longer be optional. It will be something all nonprofits have to do – standard practice, part of our responsibility to donors and to the causes they support.

Promoting the smart use of data in fundraising faces some barriers in skills and priorities and culture, but sooner or later all nonprofits will have to work harder at leveraging the power in their databases. They might do it in-house or find the expertise elsewhere. But it will be the norm.

Later this morning I’ll be speaking to a room of fundraising professionals about data mining. A few in the room will have had some experience with data mining. Some won’t. And some more will have a database that is in such rough shape that they’re not ready for it.

I’ll be keeping the tone light, and I’ll focus on what’s happening (or not happening) in my own workplace. I won’t presume to tell any of the hard-working people in the room what they ought to be doing. I won’t say that organizations that fail to make quality data collection and analysis a priority are guilty of negligence.

But I might think it.

If you don’t have a process in place to determine that a gift received this year came from someone who was also a donor last year (that is, you allow duplicate donor records to proliferate), you’re disconnected from who your real supporters are, and you’re wasting money. If you conduct surveys but do it anonymously, you’re throwing away the possibility of insight, and wasting money. If you host events but fail to track attendance in your database, you’re choosing to remain in the dark about where tomorrow’s gifts will come from, and you’re wasting money. If you segment prospect pools based solely on past giving, you exhaust existing best donors without breaking any new ground, and your unfocused approach wastes money.

Whose money? Donors’ money. Wasting donor dollars is no longer acceptable. I think donors will only get better at figuring out which charities are allowing fundraising costs to get out of control, which ones are diverting too much cash from their stated goals.

Nope, I won’t say it, but I might think it: Nonprofits that do not learn to use data will have data used against them.

22 January 2010

Four mistakes I have made

Filed under: Pitfalls, skeptics — Tags: , , — kevinmacdonell @ 1:28 pm

Is your predictive model an Edsel? Build it right, then sell it right! (Photo by dok1 on Flickr, used via Creative Commons licence.)

There are technical errors, and then there are conceptual errors. I can’t identify all the technical issues you may encounter while data mining. But it’s useful to identify a few conceptual errors. These are mistakes that that may prove damaging to your efforts to win acceptance for your models and have them applied constructively in your organization. In this blog I always write about my own experience, so the examples of stupidity you’ll read about in today’s post are all mine.

Mistake No. 1: Using score sets to predict things they weren’t designed for.

When I began creating predictive scores, I frequently referred to them as “affinity” scores. That’s how I described them to colleagues, both to make the idea accessible and because I really believed that a high score indicated a high level of affinity with our institution. Then one day I tried to use the scores to predict which class years would be most likely to attend their Homecoming milestone reunion, and thereby predict whether attendance for the upcoming reunion year would go up or down. Based on the central tendency of the scores of each class, I predicted a drop in attendance. I circulated a paper explaining my prediction and felt rather brilliant. Fortunately, I was proven wrong. That year we set a new attendance record.

My dependent variable in these early models was Lifetime Giving; therefore, the model predicted propensity to give – nothing more, nothing less. If you want to predict event attendance, build an event-attendance model. If you want to gauge alumni affinity, build a survey, or participate in an alumni engagement benchmarking study. (In Canada, check out Engagement Analysis Inc.) Be cautious, too, about making bold predictions; why give skeptics more ammunition? If you want to feel brilliant, keep it to yourself!

Lesson: Don’t be too clever.

Mistake No. 2: Using a general model to predict a specific behaviour.

This is closely related to the first mistake. By ‘general model’ I mean one in which the dependent variable is simply Lifetime Giving. I call these models ‘general’ because they make no distinction among the various types of giving (annual, major, planned) nor among preferred channel (by mail, by phone, and for some, online). Building a general model is itself not a mistake: It will work quite well for segmenting your alumni for the Annual Fund, for example, and if this is your first model it might be best not to get too exotic with how you define your dependent variable (thereby introducing new forms of error).

Just be prepared to make refinements. Two years running, our calling program used a score set from a general model, which actually worked fairly well, except for one thing: A lot of top-scoring alumni were hanging up on our student callers. This phenomenon was very noticeable, and it was enough for some to say that the model was worthless. An analysis of hang-ups confirmed that the problem existed (yes, we track hang-ups in the database). But the analysis also showed that a lot of these hanger-uppers were good donors. The top scorers were very likely to give, but a lot of them didn’t care to receive a call from a student. (And for some reason had not already requested to be solicited by mail only.)

The fix was a new predictive model aimed specifically at the calling program, with a dependent variable composed solely of dollars received via telephone appeals. Fewer hang-ups, happier callers, happier donors.

Lesson: Know what you’re predicting.

Mistake No. 3: Assuming that people will ‘get it’.

If you were able to show your fundraising colleagues that high-scoring segments of the alumni population give a lot more than the others, and that low-scoring segments give little or nothing, you’d think your work was done. Alas, no. Don’t assume that you’ll simply be able to hand off the data, because if data mining is not yet part of your institution’s culture, it’s more than likely your findings will be under-used. You’ve got to sell it.

Ensure that your end-users know what to do with their scores. Be prepared to make suggestions for applications. (Is the goal cost-cutting through reducing the solicitation effort, or is it growth in number of donors, or is it pushing existing donors to higher levels of giving?) In fact, before you even begin you should have some sense of what would really be in demand at your institution, and then try to satisfy that demand. The Annual Fund is a good place to start, but you might find that there’s a more pressing need for prospect identification in Planned Giving.

At the other end, you’ll need to understand how your colleagues implemented their scores in order to do any follow-up analysis of the effectiveness of your model. For example, if you plan to analyze the results of the Annual Fund telephone campaign, you’ll need to know exactly who was called and who wasn’t, before you can compare scores against giving.

Lesson: Communicate.

Mistake No. 4: Showing people the mental sausage.

A few years ago I used to follow a great website called, created by Merlin Mann. His boss and friend said to him one day, “Y’know, Merlin, we’re really satisfied with the actual work you do, but is there any way you could do it without showing so much … I don’t know … mental sausage?”

Data mining and predictive modeling and cool data stuff are all exercises in discovery. When we discover something new, our natural urge is to share. In the past, I tended to share the wrong way: I would carefully reveal my discovery as if the process were unfolding in real time. These expositions (usually in the form of a Word document emailed around) would usually be rather long. The central message would often be buried in detail which someone not inhabiting my head would regard as extraneous.

Don’t expect people to follow your plot: They’re too busy. They need the back of a cereal box, and you’re sending them Proust. You need to make your point, back it up with the minimum amount of verbiage acceptable, incorporate visuals judiciously, and get the hell out.

Learn to use the charting options available in Excel or some other software to get your point across as effectively as possible. Offer to explain it face-to-face. Offer to present on it.

Lesson: Learn how to sell.

« Newer PostsOlder Posts »

Blog at