CoolData blog

2 December 2013

How to learn data analysis: Focus on the business

Filed under: Training / Professional Development — Tags: , , , — kevinmacdonell @ 6:17 am

A few months ago I received an email from a prospect researcher working for a prominent theatre company. He wanted to learn how to do data mining and some basic predictive modeling, and asked me to suggest resources, courses, or people he could contact. 

I didn’t respond to his email for several days. I didn’t really have that much to tell him — he had covered so many of the bases already. He’d read the  book “Data Mining for Fund Raisers,”  by Peter Wylie, as well as “Fundraising Analytics: Using Data to Guide Strategy,” by Joshua Birkholz. He follows this blog, and he keeps up with postings on the Prospect-DMM list. He had dug up and read articles on the topic in the newsletter published by his professional association (APRA). And he’d even taken two statistics course — those were a long time ago, but he had retained a basic understanding of the terms and concepts used in modeling.

He was already better prepared than I was when I started learning predictive modeling in earnest. But as it happened, I had a blog post in draft form (one of many — most never see the light of day) which was loosely about what elements a person needs to become a data analyst. I quoted a version of this paragraph in my response to him:

There are three required elements for pursuing data analysis. The first and most important is curiosity, and finding joy in discovery. The second is being shown how to do things, or having the initiative to find out how to do things. The third is a business need for the work.

My correspondent had the first element covered. As for the second element, I suggested to him that he was more than ready to obtain one-on-one training. All that was missing was defining the business need … that urgent question or problem that data analysis is suited for.

Any analysis project begins with formulating the right question. But that’s also an effective way to begin learning how to do data analysis in the first place. Knowing what your goal is brings relevance, urgency and focus to the activity of learning.

Reflect on your own learning experiences over the years: Your schooling, courses you’ve taken, books and manuals you’ve worked your way through. More than likely, this third element was mostly absent. When we were young, perhaps relevance was not the most important thing: We just had to absorb some foundational concepts, and that was that. Education can be tough, because there is no satisfying answer to the question, “What is the point of learning this?” The point might be real enough, but its reality belongs to a seemingly distant future.

Now that we’re older, learning is a completely different game, in good ways and bad. On the bad side, daily demands and mundane tasks squeeze out most opportunities for learning. Getting something done seems so much more concrete than developing our potential. 

On the good side, now we have all kinds of purposes! We know what the point is. The problems we need to solve are not the contrived and abstract examples we encountered in textbooks. They are real and up close: We need to engage alumni, we need to raise more money, we need, we need, we need.

The key, then, is to harness your learning to one or more of these business needs. Formulate an urgent question, and engage in the struggle to answer it using data. Observe what happens then … Suddenly professional development isn’t such an open-ended activity that is easily put off by other things. When you ask for help, your questions are now specific and concrete, which is the best way to generate response on forums such as Prospect-DMM. When you turn to a book or an internet search, you’re looking for just one thing, not a general understanding.

You aren’t trying to learn it all. You’re just taking the next step toward answering your question. Acquiring skills and knowledge will be a natural byproduct of what should be a stimulating challenge. It’s the only way to learn.

 

26 April 2012

For agile data mining, start with the basics

Filed under: Analytics, Pitfalls, Training / Professional Development — Tags: , , , — kevinmacdonell @ 8:56 am

Lately I’ve been telling people that one of the big hurdles to implementing predictive analytics in higher education advancement is the “project mentality.” We too often think of each data mining initiative as a project, something with a beginning and end. We’d be far better off to think in terms of “process” — something iterative, always improving, and never-ending. We also need to think of it as a process with a fairly tight cycle: Deploy it, let it work for a bit, then quickly evaluate, and tweak, or scrap it completely and start over. The whole cycle works over the course of weeks, not months or years.

Here’s how it sometimes goes wrong, in five steps:

  1. Someone has the bright idea to launch a “major donor predictive modelling project.” Fantastic! A committee is struck. They put their heads together and agree on a list of variables that they believe are most likely to be predictive of major giving.
  2. They submit a request to their information management people, or whomever toils in extracting stuff from the database. Emails and phone calls fly back and forth over what EXACTLY THE HECK the data mining team is looking for.
  3. Finally, a massive Excel file is delivered, a thing the likes of which would never exist in nature — like the unstable, man-made elements on the nether fringes of the Periodic Table. More meetings are held to come to agreement about what to do about multiple duplicate rows in the data, and what to do about empty cells. The committee thinks maybe the IT people need to fix the file. Ummm — no!
  4. Half of the data mining team then spends considerable time in pursuit of a data file that gleams in its cleanliness and perfection. The other half is no longer sure what the goal of the project was.
  5. Somehow, a model is created and the records are scored by the one team member left standing. Unfortunately, a year has passed and the person for whom the model was built has left for a new job in California. Her replacement refers to the model as “astrology.”

Allow me a few observations that follow from these five stages:

  1. Successful models are rarely produced by committee, and variables cannot be pre-selected by popular agreement and intuition — although certainly experience is a valuable source of clues.
  2. Submitting requests to someone else for data, having to define exactly what it is you want, and then waiting for the request to be fulfilled — all of that is DEATH to creative data exploration.
  3. A massive, one-time, all-or-nothing data suction job is probably not the ideal starting point. Neither is handling an Excel file with 200,000 rows and a hundred columns.
  4. Perfect data is not a realistic goal, and is not a prerequisite for fruitful data mining.
  5. A year is too long. The cycle has to be much, much tighter than that.

And finally, here are some concrete steps, based on the observations, again point-for-point:

  1. If you’re interested in data mining, try going it alone. Ask for help when you need it, but you’ll make faster progress if you explore on your own or in a team of no more than two or three like-minded people. Don’t tell anyone you’re launching a “project,” and don’t promise deliverables unless you know what you’re doing.
  2. Learn how to build simple queries to pull data from your database. Get IT to set you up. Figure out how to pull a file of IDs along with sum of all their hard-credit giving. Then, pull that AND something else — anything else. Email address, class year, marital status, whatever. Practice, get comfortable with how your data is stored and how to limit it to what you want.
  3. Look into stats software, and learn some of the most common stats terms. Read up on correlation in particular. Build larger files for analysis in the stats software rather than in Excel. Read, read, read. Play, play, play.
  4. Think in terms of pattern detection, and don’t get hung up on the validity of individual data points.
  5. If you’ve done steps 1 to 4, you have the foundations in place for being an agile data miner.

Mind you, it could take considerable time — months, maybe even years — to get really comfortable with the basics, especially if data mining is a sideline to your “real” job.  But success and agility does depend on being able to work independently, being able to snag data on a whim, being able to understand a bit of what is going on in your software, having the freedom to play and explore, and losing notions about data that come from the business analysis and reporting side. In other words, the basics.

Create a free website or blog at WordPress.com.