CoolData blog

6 May 2011

Wanted: More ways to learn predictive modeling

Filed under: Peter Wylie, Training / Professional Development — Tags: , — kevinmacdonell @ 5:11 am

I remember the first time I opened up the statistics software package I now use to build predictive models. I had read Peter Wylie’s book, Data Mining for Fundaisers, so I had the basic idea in my head, plus a dose of Peter’s larger-than-life enthusiasm. The next step was to download a trial version of Data Desk to see if I could apply what I’d read to some of my own data. But I was a long way off from knowing how to build my first model.

Here’s what I saw:

It was a tabula rasa. Much like my brain. Exciting things may come from these blank-slate moments, but not this time — I had no idea what to do first. I clicked on some of the menus, like the one below, which didn’t help. Even after loading my data, a simple paste operation from Excel, I was missing the “now do this” element.

So I did what many others have done with a stats package they’ve looked at for the first time — I closed and uninstalled it. (I’ve done the same with SPSS, Minitab and other programs.) I could have tinkered with it and made some progress on my own, but I had pressing work to do. Data mining was a personal interest, not a priority. It wasn’t the latest crisis du jour and therefore it wasn’t “work”.

I don’t blame the software. Help files and manuals can be quite good. But most good software is capable of doing a lot more than just the one task ones seeks to carry out; the manual will be more general and comprehensive than required. Translating Peter’s straightforward method to precise steps in Data Desk required me to isolate those functions in the software, and I had no luck with that. As well, the manual was full of stats terms I was not familiar with.

Fortunately the story didn’t end there. Peter himself, aware of my interest, worked with me to show how I could get smart about using our data. Thus armed, I was able to convince my manager that we needed to invest in one-on-one training.

What did training accomplish that working on my own could not?

  • One, the training was couched in the language of fundraising, not statistics. Terms from statistics were introduced as needed, and selectively. A comprehensive understanding of stats was not the goal.
  • Two, it was specific to the software that I was actually using. This allowed every step to be as concrete as, “Next, click on the Manip menu and select …”. I was shown how to use the small set of software features that I really needed, and we ignored the rest.
  • Three, it was specific to my own data. I learned through the process of building a model for our own institution, with data pulled from our own database. It was the first time I had seen our alumni and donation data presented this way. If we had never proceeded to full-on data mining, I still would have learned a lot about our constituency.

Analytics is a popular topic of discussion at fundraising conferences, where everyone says the right things about predictive modeling and data-driven decision making. And yet, how many development offices are doing the work? Not as many as could be.

The bad news is, there is a skills shortage. The good news is, filling the shortage does not mean hiring analysts with advanced degrees in statistics (although, three cheers for you if you do). You or others in your office can do the work — but only if the barriers are removed.

What are the barriers? They are the flipside of the three strengths of one-on-one training:

  • One, many of the relevant books and online resources are couched in the language of statistics. Which elements of statistics are necessary to understand and which are optional is not made explicit. As well, there are numerous approaches to modeling, which confuses anyone trying to focus on the approach that works best for their application.
  • Two, the mechanics of modeling differ from software package to software package. A development office staff person looking for the exact set of steps to accomplish one specific task is not likely to find what they’re looking for.
  • Three, the would-be analyst needs to work with data from their own database and learn how to look at it in a whole new way. It helps if the teaching resource you’re using talks about data from an alumni or fundraising perspective, but even within that world, everyone’s data is different.

Any one of the three barriers may be surmountable on its own; it’s the fact that all three occur together that stops people in their tracks. That’s what happened to me in my tabula rasa moment. It’s like someone who’s never been in a kitchen before needing to cook a specific meal for which there is no recipe — because in the analytics kitchen, a recipe is not only specific to the desired dish (the outcome), but to the oven (software) and to the ingredients on hand (data). Any specific recipe would have to be adapted, which is too much to ask of the beginner cook. Conversely, any overall method that attempts to explain more than one dish, more than one brand of oven, and an endless variety of ingredients is too general to be called a recipe.

For these reasons, when people ask me how to get started in predictive modeling, I always steer them toward one-on-one training. Nothing else really works. Conference sessions can inspire, or lead to a new idea or two, but it stops there. Books are great, but there isn’t a single book that contains a step-by-step guide that covers more than a fraction of fundraising modeling situations. The Internet can be a wonderful resource, but much of what you’ll find is highly technical, doesn’t apply directly to our purposes, and is completely lacking a road map for the uninitiated.

Sadly, this blog has to be counted among the resources that don’t make the grade. I think CoolData does some things well: Addressing a gap, I have always used examples drawn from alumni, nonprofit and donor data; I’ve tried to string my ideas together in some kind of order (Guide to CoolData), and I’ve tried to stay focused on one outcome (behaviour prediction for segmentation, essentially) and one modeling technique (regression), instead of straying too often into other areas.

But I have not provided anything like a step-by-step guide that works for a majority of people who are interested in data mining but don’t know how to go about it. Not that I think it’s impossible. One-on-one training is superior to “book learning,” but I believe there ought to be options for other learning styles. A chef must learn the art in the presence of a master, but the rest of us have recipe books. While no one can deny the superiority of the former, the majority of us get by in the kitchen using the latter — and some dine very well thereby.

It would be an interesting challenge to come up with a way to convey how to do predictive modeling to a beginner in a way that balances the specificity of the recipe book with the endless variety of our real-world data kitchens. Such a product (whatever form it takes) might not be a substitute for training, but it could either augment training or at least get one started. Unlike this blog, it would probably not be free.

Well, it’s something to think about.



  1. Thank you for your insightful blog on predictive modeling. I hope to jump in and learn how to do it this year!

    Comment by Marilyn L. Jones — 6 May 2011 @ 8:50 am

  2. Thank you so much for perspective on analytics training. Who does one-on-one training on analytics for non-profits? Could you share a few resources?

    Comment by Cristina MacMahon — 9 May 2011 @ 10:47 am

  3. Hi Cristina: Well, Peter Wylie and John Sammis are the trainers I’m most familiar with, and I highly recommend them. You can email me ( if you’d like to know more. I’m sure there are other excellent vendors out there, however — I just don’t know their work first-hand. Since you’ve asked the question, I welcome all and any providers of training to comment here.

    Comment by kevinmacdonell — 9 May 2011 @ 7:42 pm

  4. I agree wholeheartedly, which is why I conduct a four-part session on getting started, and SPSS agrees, too,which is why they offer the software. I’m developing some new products, too, to bridge that gap between the very good Excel users and the SPSS or SAS starters — or to help the newbies all of it. I’ve worked quite a while on dealing with what you talk about here. We even had someone asking over and over how to do fundraising analytics using Modeler (is it IBM or PASW Modeler now?), and I agree that such a book needs to be written. Working on all this one at a time. –Marianne

    Comment by Marianne Pelletier — 9 May 2011 @ 10:34 pm

  5. Kevina and Marianne,

    Thank you so much for the recommendations.

    Comment by Cristina macmahon — 10 May 2011 @ 9:03 am

  6. Tell me if I am off base, but it seems to me that this blog could add (if you were so inclined to add such a discussion segment) some discussions at least on the very basic starting points of analysis. How does one make a relevant segmentation as you have used often in your data discussions. For example, we have say 30,000 records in our database: alums, corporations, foundations, friends. I could sort all of these according to gifts made in the last 10 years and divide into 10 groups. Seems in my limited knowledge similar to some of what you have done in your posts (though you usually did a fair bit of other analysis first, such as who has an email or phone number in the record). Is my segmentation even useful for further analysis? Many records have no gifts – I could exclude them, but what does that do to the sample? Still useful, or more useful? Why? What are important issues for a basic researcher in this case? Clearly, there’s much to know – I don’t even know (as may be obvious here) whether I pose a good hypothetical for a basic researcher. Here’s a question – for a person who knows only perhaps how to spell “statistics” and can add, subtract, divide, and multiply, what are the five first things to know/learn/understand about doing this kind of work on their donor database? thanks for your insights! Much appreciated.

    Fred Mischler

    Comment by Fred MIschler — 19 May 2011 @ 11:50 am

    • Fred: There is a need for what you suggest, i.e. an explanation of the very basic starting points of analysis. This blog hasn’t been it, and I don’t think it ever will be. CoolData goes wherever I go. When I start working with some concept, I write about it at the same time. That means a different topic almost every week, which is what keeps it interesting for me. On the other hand, as I hinted towards the end of this post, I think developing a text or package that takes a beginner from zero to developing a predictive model would be an interesting challenge, so you might want to stay tuned for whatever comes of that.

      The challenge, as I say in my post, is that it is hard to generalize about analytics. Everyone’s business problems are different, everyone’s data tools are different, and everyone’s data is unique. But in the example you mention, I would probably never combine alumni, corporations, foundations and non-alumni prospects in a single analysis. So there’s a first principle of analysis: Ensure that your sample really consists of a single population. What do alumni and corporations have in common? Very little, aside from giving. Any sample I create for analysis is essentially made up of any group of entities that are to be solicited AS a group. So for example, the group I score for Phonathon solicitation consists of every living alum (not friends or corporations), while the group I score for Planned Giving potential is all living alumni over the age of 45 or 50.

      What I sense is missing from the analysis example you’ve provided is a formulation of the question you’re trying to answer, or the problem you’re trying to solve. Your analysis might be perfectly valid for a certain question or problem. The question or problem comes FIRST. So there’s another basic principle: Get it clear what you’re trying to do first, and the specific method you’ll use will follow from that. Your example sounds a bit like the beginnings of an RFM analysis (Recency, Frequency, and Monetary value), and is fine as far as it goes, and depending on what you’re trying to do.

      If you can add, subtract, multiple and divide, you’re all set as far as I’m concerned. Here are some starting points for getting to work on your donor database:

      Seven building blocks for your data work

      Comment by kevinmacdonell — 22 May 2011 @ 9:05 pm

      • I guess that is what many blogs are: “here are my thoughts on this idea that popped into my head not too long ago . . .” Kind of an organized stream of consciousness.

        I think I can take a bit of solace in the fact that I knew my hypo included disparate groups that probably shouldn’t be together, but who knows that from the start, right?

        And from your last paragraph, perhaps another first principle: think about and formulate a the *real* question you are trying to answer. Of course, discriminating a *real* question from all the wild thoughts bouncing around the skull of a stats newbie may take a bit of learning first, so back to the beginning . . .

        Eagerly awaiting your package of info to aid in going from zero to model.

        thanks again!

        Comment by Fred MIschler — 23 May 2011 @ 11:25 am

  7. Fred: Where I work, Annual Giving is pressured to evaluate their approach very systematically, yet holistically. That is, we segment New Donors, Reactivated Donors, 2-4 Yr Returning Donors, 5+ Returning Donors, 1 Yr Lapsed Donors, 2-4 Yr Lapsed Donors, 5-9 Yr Lapsed Donors, Parent (New, Reactivated, 2-4 Yr Returning Donors, 5+ Returning Donors), Parent 1-3 Yr Lapsed Donors, Current Parents (Freshman, Sophomore, Junior, Senior, Post-Grad, Multiple Students), Non Donor Alum in last 10 yrs, 11-25, 26-36, etc… And then we have our specific ‘constituent’ segments for special programs around campus…

    Holistic in that we strive for succesful TeleFund Programs, Direct Mail Programs, Email, etc, but the last thing we want to hold the program managers against each other. We see a prospect who was called, denied, and then gave through the mail as not a loss from the TeleFund side, but a success, since who knows if why they responded to the mailing wasn’t due to the great call they received earlier in the year, followed by a informative email?

    What analytics mean for us is to help answer questions such as : Who, when solicited, is likely to give? How much? What is expected revenue, avg gift, participation rate, upgrades? What about them factors stronger than other things in their propensity? Where do they give? All of these answers can tell us alot on who to solicit and what message is working.

    The ultimate goal is to constantly be testing different approaches and seeing what effect they have via factorial testing. People change, what works changes, and how can you know how to adapt without analysis?

    Reverse engineering this, you have to be able to accurately predict what to expect if you changed nothing from year to year. To accurately predict, you have to actually KNOW what has happened from year to year and I am always stunned how little descriptive statistics are used in other schools programs due to sub-par technical and analytical support. Underneath this, there must be good data practices, yet another source of common problems. Once these are all in place, the segmentation itself needs some sense of consistency over the years too in order to accurately compare modeling for multiple years, and I’ve heard horror stories of Annual programs redrafting their segmentation strategy each time they solicit…

    Comment by Brock — 22 September 2011 @ 10:56 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at

%d bloggers like this: