I remember the first time I opened up the statistics software package I now use to build predictive models. I had read Peter Wylie’s book, Data Mining for Fundaisers, so I had the basic idea in my head, plus a dose of Peter’s larger-than-life enthusiasm. The next step was to download a trial version of Data Desk to see if I could apply what I’d read to some of my own data. But I was a long way off from knowing how to build my first model.
Here’s what I saw:
It was a tabula rasa. Much like my brain. Exciting things may come from these blank-slate moments, but not this time — I had no idea what to do first. I clicked on some of the menus, like the one below, which didn’t help. Even after loading my data, a simple paste operation from Excel, I was missing the “now do this” element.
So I did what many others have done with a stats package they’ve looked at for the first time — I closed and uninstalled it. (I’ve done the same with SPSS, Minitab and other programs.) I could have tinkered with it and made some progress on my own, but I had pressing work to do. Data mining was a personal interest, not a priority. It wasn’t the latest crisis du jour and therefore it wasn’t “work”.
I don’t blame the software. Help files and manuals can be quite good. But most good software is capable of doing a lot more than just the one task ones seeks to carry out; the manual will be more general and comprehensive than required. Translating Peter’s straightforward method to precise steps in Data Desk required me to isolate those functions in the software, and I had no luck with that. As well, the manual was full of stats terms I was not familiar with.
Fortunately the story didn’t end there. Peter himself, aware of my interest, worked with me to show how I could get smart about using our data. Thus armed, I was able to convince my manager that we needed to invest in one-on-one training.
What did training accomplish that working on my own could not?
- One, the training was couched in the language of fundraising, not statistics. Terms from statistics were introduced as needed, and selectively. A comprehensive understanding of stats was not the goal.
- Two, it was specific to the software that I was actually using. This allowed every step to be as concrete as, “Next, click on the Manip menu and select …”. I was shown how to use the small set of software features that I really needed, and we ignored the rest.
- Three, it was specific to my own data. I learned through the process of building a model for our own institution, with data pulled from our own database. It was the first time I had seen our alumni and donation data presented this way. If we had never proceeded to full-on data mining, I still would have learned a lot about our constituency.
Analytics is a popular topic of discussion at fundraising conferences, where everyone says the right things about predictive modeling and data-driven decision making. And yet, how many development offices are doing the work? Not as many as could be.
The bad news is, there is a skills shortage. The good news is, filling the shortage does not mean hiring analysts with advanced degrees in statistics (although, three cheers for you if you do). You or others in your office can do the work — but only if the barriers are removed.
What are the barriers? They are the flipside of the three strengths of one-on-one training:
- One, many of the relevant books and online resources are couched in the language of statistics. Which elements of statistics are necessary to understand and which are optional is not made explicit. As well, there are numerous approaches to modeling, which confuses anyone trying to focus on the approach that works best for their application.
- Two, the mechanics of modeling differ from software package to software package. A development office staff person looking for the exact set of steps to accomplish one specific task is not likely to find what they’re looking for.
- Three, the would-be analyst needs to work with data from their own database and learn how to look at it in a whole new way. It helps if the teaching resource you’re using talks about data from an alumni or fundraising perspective, but even within that world, everyone’s data is different.
Any one of the three barriers may be surmountable on its own; it’s the fact that all three occur together that stops people in their tracks. That’s what happened to me in my tabula rasa moment. It’s like someone who’s never been in a kitchen before needing to cook a specific meal for which there is no recipe — because in the analytics kitchen, a recipe is not only specific to the desired dish (the outcome), but to the oven (software) and to the ingredients on hand (data). Any specific recipe would have to be adapted, which is too much to ask of the beginner cook. Conversely, any overall method that attempts to explain more than one dish, more than one brand of oven, and an endless variety of ingredients is too general to be called a recipe.
For these reasons, when people ask me how to get started in predictive modeling, I always steer them toward one-on-one training. Nothing else really works. Conference sessions can inspire, or lead to a new idea or two, but it stops there. Books are great, but there isn’t a single book that contains a step-by-step guide that covers more than a fraction of fundraising modeling situations. The Internet can be a wonderful resource, but much of what you’ll find is highly technical, doesn’t apply directly to our purposes, and is completely lacking a road map for the uninitiated.
Sadly, this blog has to be counted among the resources that don’t make the grade. I think CoolData does some things well: Addressing a gap, I have always used examples drawn from alumni, nonprofit and donor data; I’ve tried to string my ideas together in some kind of order (Guide to CoolData), and I’ve tried to stay focused on one outcome (behaviour prediction for segmentation, essentially) and one modeling technique (regression), instead of straying too often into other areas.
But I have not provided anything like a step-by-step guide that works for a majority of people who are interested in data mining but don’t know how to go about it. Not that I think it’s impossible. One-on-one training is superior to “book learning,” but I believe there ought to be options for other learning styles. A chef must learn the art in the presence of a master, but the rest of us have recipe books. While no one can deny the superiority of the former, the majority of us get by in the kitchen using the latter — and some dine very well thereby.
It would be an interesting challenge to come up with a way to convey how to do predictive modeling to a beginner in a way that balances the specificity of the recipe book with the endless variety of our real-world data kitchens. Such a product (whatever form it takes) might not be a substitute for training, but it could either augment training or at least get one started. Unlike this blog, it would probably not be free.
Well, it’s something to think about.