I use these terms interchangeably, but not because they mean exactly the same thing. When I refer to “data mining,” I’m usually just trying to use a term that sounds familiar to an audience. It’s a buzzword that’s been around a long time. But what I probably mean to say is predictive modeling.
What’s the difference? There are plenty of definitions available for both terms, but in my regular usage I think of data mining as any activity that involves exploring large data sets for patterns or to answer specific questions (which may or may not have anything to do with predicting behaviour). For example, the work that annual giving managers do when they use certain criteria to allocate alumni to by-mail or phone channels, or create a myriad of calling groups for phonathon, classifies as data mining, as far as I’m concerned. This work might be done right in the database, in a spreadsheet, or with statistical software.
I like to be able to tell people who are new to predictive modeling that they probably already “do” data mining, if they plow through data as part of their regular work. They’re just a conceptual step or two away from understanding predictive modeling.
Data mining might also be the right term to describe the exploration of variables for correlation with giving, which naturally shades into the actual creation of predictive models for giving. Predictive modeling itself, though, is the creation of formulas that produce scores for each constituent in a database for the purpose of predicting that constituent’s probability of engaging in a certain behaviour (eg., giving to the Annual Fund).
That’s a clunky definition, and it sounds really complicated. But keep in mind that the tools we use to accomplish this (a computer, statistical software, and statistical methods such as regression) do all the work, and we never need to see the actual formula or the underlying math. Our main tasks are to ensure the quality and relevance of the data, determine exactly what we’re trying to predict, choose our predictors using some common sense, and then finally export the predicted scores that result from the analysis (and then, preferably, load them into our database).
These thoughts about terminology were sparked by a piece written by Tonya Balan, manager of the analytics product management team for SAS. As I said, there are definitions for this stuff all over the web, but Balan does a nice job of drawing distinctions between all the terms we often hear thrown around: analytics, data mining, predictive modeling, predictive analytics, forecasting and so on.