CoolData blog

1 April 2015

Mind the data science gap

Filed under: Training / Professional Development — Tags: , , — kevinmacdonell @ 8:10 pm

 

Being a forward-thinking lot, the data-obsessed among us are always pondering the best next step to take in professional development. There are more options every day, from a Data Science track on Coursera to new masters degree programs in predictive analytics. I hear a lot of talk about acquiring skills in R, machine learning, and advanced modelling techniques.

 

All to the good, in general. What university or large non-profit wouldn’t benefit from having a highly-trained, triple-threat chameleon with statistics, programming, and data analytics skills? I think it’s great that people are investing serious time and brain cells pursuing their passion for data analysis.

 

And yet, one has to wonder, are these advanced courses and tools helping drive bottom-line results across the sector? Are they helping people at nonprofits and university advancement offices do a better job of analyzing their data toward some useful end?

 

I have a few doubts. The institutions and causes that employ these enterprising learners may be fortunate to have them, but I would worry about retention. Wouldn’t these rock stars eventually feel constrained in the nonprofit or higher ed world? It’s a great place to apply one’s creativity, but aren’t the problems and applications one can address with data in our field relatively straightforward in comparison with other fields? (Tailoring medical treatment to an individual’s DNA, preventing terrorism or bank fraud, getting an American president elected?) And then there’s the pay.

 

Maybe I’m wrong to think so. Clearly there are talented people working in our sector who are here because they have found the perfect combination of passions. They want to be here.

 

Anyway — rock star retention is not my biggest concern.

 

I’m more concerned about the rest of us: people who want to make better use of data, but aren’t planning to learn way more than we need or are capable of. I’m concerned for a couple of reasons.

 

First, many of the professional development options available are pitched at a level too advanced to be practical for organizations who haven’t hired a full-time predictive analytics specialist. The majority of professionals working in the non-profit and higher-ed sectors are mainly interested in getting better at their jobs, whether that’s increasing dollars raised or boosting engagement among their communities. They don’t need to learn to code. They do need some basic, solid training options. I’m not sure these are easy to spot among all the competing offerings and (let’s be honest) the Big Data hype.

 

These people need support and appropriate training. There’s a place for scripting and machine learning, but let’s ensure we are already up to speed on means/medians, bar charts, basic scoring, correlation, and regression. Sexy? No. But useful, powerful, necessary. Relatively simple and manual techniques that are accessible to a range of advancement professionals — not just the highly technical — offer a high return on investment. It would be a shame if the majority were cowed into thinking that data analysis isn’t for them just because they don’t see what neural networks have to do with their day to day work.

 

My second concern is that some of the advanced tools of data science are deceptively easy to use. I read an article recently that stated that when it’s done really well, data science looks easy. That’s a problem. A machine-learning algorithm will spit out answers, but are they worth anything? (Maybe.) Does an analyst learn anything about their data by tweaking the knobs on a black box? (Probably not.) Is skipping over the inconvenience of manual data exploration detrimental to gaining valuable insights? (Yes!)

 

Don’t get me wrong — I think R, Python, and other tools are extremely useful for predictive modelling, although not for doing the modelling itself (not in my hands, at least). I use SQL and Python to automate the assembly of large data files to feed into Data Desk — it’s so nice to push a button and have the script merge together data from the database, from our phonathon database, from our broadcast email platform and other sources, as well as automatically create certain indicator variables, pivoting all kinds of categorical variables and handling missing data elegantly. Preparing this file using more manual methods would take days.

 

But this doesn’t automate exploration of the data, it doesn’t remove the need to be careful about preparing data to answer the business question, and it does absolutely nothing to help define that business question. Rather than let a script grind unsupervised through the data to spit out a result seconds later without any subject-matter expertise being applied, the real work of building a model is still done manually, in Data Desk, and right now I doubt there is a better way.

 

When it comes to professional development, then, all I can say is, “to each their own.” There is no one best route. The important thing is to ensure that motivated professionals are matched to training that is a good fit with their aptitudes and with the real needs of the organization.

 

Advertisements

Blog at WordPress.com.

%d bloggers like this: