CoolData blog

26 August 2015

Exploring associations between variables

Filed under: Book, CoolData, Predictor variables — Tags: , , , — kevinmacdonell @ 6:57 pm

 

CoolData has been quiet over the summer, mainly because I’ve been busy writing another book. (Fine weather has a bit to do with it, too.) The book will be for nonprofit and higher education advancement professionals interested in learning how to use multiple regression to build predictive models. Over the next few months, I will adapt various bits from the work-in-progress as individual posts here on CoolData.

 

I’ll have more to say about the book later, so if you’re interested, I suggest subscribing via email (see the box to the right) to have the inside track on this project. (And if you aren’t familiar with the previous book, co-written with Peter Wylie, then have a look here.)

 

A generous chunk of the book is about the specifics of getting your hands dirty with cleaning up your messy data, transforming it to make it suitable for regression analysis, and exploring it for interesting patterns that can strengthen a predictive model.

 

When you import a data set into Data Desk or other statistics package, you are looking at more than just a jumble of variables. All these variables are in a relation; they are linked by patterns. Some variables are strongly associated with each other, others have weaker associations, and some are hardly related to each other at all.

 

What is meant by “association”? A classic example is a data set of children’s weights, heights, and ages. Older children tend to weigh more and be taller than younger children. Heavier children tend to be older and taller than younger children. We say that there is an association between age and weight, between height and weight, and between age and height.

 

Another example: Alumni who are bigger donors tend to attend more reunion events than alumni who give more modestly or don’t give at all. Or put the other way, alumni who attend more events tend to give more than alumni who attend fewer or no events. There is an association between giving and attending events.

 

This sounds simple enough — even obvious. The powerful consequence of these truths is that if we know the value of one variable, we can make a guess at the value of another, as long as the association is valid. So if we know a child’s weight and height, we can make a good guess of his or her age. If we know a child’s height, we can guess weight. If we know how many reunions an alumna has attended, we can make a guess about her level of giving. If we know how much she has given, we can guess whether she’s attended more or fewer reunions than other alumni.

 

We are guessing an unknown value (say, giving) based on a known value (number of events attended). But note that “giving” is not really an unknown. We’ve got everyone’s giving recorded in the database. What is really unknown is an alum’s or a donor’s potential for future giving. With predictive modeling, we are making a guess at what the value of a variable will be in the (near) future, based on the current value of other variables, and the type and degree of association they have had historically.

 

These guesses will be far from perfect. We aren’t going to be bang-on in our guesses of children’s ages based on weight and height, and we certainly aren’t going to be very accurate with our estimates of giving based on event attendance. Even trickier, projecting into the future — estimating potential — is going to be very approximate.

 

Still, our guesses will be informed guesses, as long as the associations we detect are real and not due to random variation in our data. Can we predict exactly how much each donor is going to give over this coming year? No, that would be putting too much confidence in our powers. But we can expect to have plenty of success in ranking our constituents in order by how likely they are to engage in whatever behaviour we are interested in, and that knowledge will be of great value to the business.

 

Looking for potentially useful associations is part of data exploration, which is best done in full hands-on mode! In a future post I will talk about specific techniques for exploring different types of variables.

 

Advertisement

22 April 2014

Score! ships tomorrow

Filed under: Book, Score! — Tags: , , — kevinmacdonell @ 7:29 pm

scoreThe printer delivered early, and a copy of Score! showed up at CASE headquarters in Washington DC this afternoon.

(Doug Goldenberg-Hart, CASE’s Director, Editorial Projects sent this photo to prove it.)

To everyone who put in an advance order, your copy will be available to ship tomorrow (Wednesday).

Peter Wylie and I sincerely hope you enjoy it.

Click here to order.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

23 December 2013

New from CASE Books: Score!

Filed under: Book, CoolData, Peter Wylie — Tags: , , , — kevinmacdonell @ 9:39 am

CASE_coverAs the year draws to a close, I’m pleased to announce that the book I’ve co-written with Peter Wylie will be available in January. ‘Score!’ joins a host of fine publications in CASE’s new catalog. I’m looking forward to having a look through this catalog for new books for the office. (‘Score’ is featured on page 12.)

So what is this new book about? The full title is Score!: Data-Driven Success for Your Advancement Team, and as a recent of issue of BriefCASE notes: “Kevin MacDonell and Peter Wylie walk readers through compelling arguments for why an organization should adopt data-driven decision-making as well as explanations of basic issues such as identifying and mining the pertinent data and what operations to perform once that data is in hand.”

You can read the rest of that article here: Ready to Score!?

20 December 2010

What I’m reading, and how

Filed under: Training / Professional Development — Tags: , , — kevinmacdonell @ 6:30 am

Along with the blogs and online publications I read, I usually have two books on the go at once, sometimes three or more. That’s not as impressive as it sounds. I’m a slow reader (or as I tell people, a “careful” reader) with a short attention span. I like to read, just not more than a few pages at a stretch from any one book. At the moment I’m excited about several books I have piled around me, and I hope to make some headway on them over the holidays when I am not feeling too full, inebriated or stupid. Have a look:

Now You See It: Simple Visualization Techniques for Quantitative Analysis, by Stephen Few (2009, Analytics Press) is a big, beautiful chunk of a book. Its dimensions are those of a university textbook or reference, but it should be read from start to finish, as I am now doing. Understanding data, thinking about data, and communicating its messages require us to make pictures of the data. This book is a journey through successive layers of complexity in the art and science of visualization. It’s not a stats book, but more like what Strunk and White’s “The Elements of Style” is for writers — a foundation text for people who need to communicate data-based ideas visually, with the aid of (and sometimes in spite of) the software we have available.

Stats: Data and Models, by Richard De Veaux, Paul Velleman, and David Bock (Second Edition, 2008, Pearson Education, Inc.). Unlike Stephen Few’s book, which looks like a textbook but isn’t, this hefty book IS a textbook. It’s not the likeliest thing one would want to read from cover to cover, but that’s what I’m doing. Why? Because this is the intro stats course I never took in university. It’s highly readable, with an informal style, and it focuses on explaining concepts using real data from familiar sources, rather than pushing theories and equations at you. The emphasis is on using software to do the computational work, with examples included for some of the more common stats packages. (One of the authors, Paul Velleman, is the developer of Data Desk, the software I use.) This book is for the beginning student in statistics, but it wastes no time getting right into correlation, regression and statistical models. Again, a book with the potential for being foundational for people who need to use data in their work.

The Nonprofit Buyer: Strategies for Success from a Nonprofit Technology Sales Veteran, by Andrew Urban (2010, CreateSpace) was written by a guy with extensive experience in sales of technology to the nonprofit sector, working with vendors such as Convio, Kintera, and Serenic Software with a background that includes grassroots work as a donor, volunteer, board member, and staffer. I have to admit that the subject of this book is not riveting (unlike, say, statistics!), but it IS important. All of us deal with vendors and we all have to evaluate what solutions are best for our organizations — sometimes a painful process. Analytics products, tools and training are some of the more expensive investments nonprofits can make, and we don’t do it as well as we ought to. As Andrew Urban observes, vendors put more training in learning how to sell to a nonprofit than a nonprofit puts into learning how to buy. This book helps to level the playing field, which can only lead to more fruitful and long-term vendor-customer relationships that benefit both parties. NOTE: On January 26, Andrew Urban will be giving a free webinar based on a concept from his book, presented by nonprofitwebinars.com. Click here: Return on Mission.

I don’t get anything for recommending books; I will leave you to surf your own way to purchase any of these three.

Back in June I urged readers to buy their own books, as part of taking responsibility for their own professional development. You might also consider checking out your closest university library, or the student bookstore where you might find used copies of textbooks. There’s also Bookmooch, a fantastic book-exchange service that I’ve used heavily over the years to dispose of books I’m no longer interested in while receiving all kinds of books that I want. (Unfortunately, though, sought-after nonfiction doesn’t come available very often.)

Then there are free online resources such as blogs. I read a variety of blogs on fundraising, statistics, personal productivity, technology and other topics. I don’t actively follow any particular blog. Rather, links to specific posts come to me via Twitter, sent out by the people I follow. You won’t miss much that is truly of interest to you if you follow the right people, and I try to promote quality posts by re-tweeting anything that I think deserves to be read. You can follow me here, if you think I retweet stuff that interests you.

I read books at home on the couch with a glass of red wine, which may help explain my limited powers of concentration, but I read blogs and online media during my commute on the Number 80 bus. I’ve recently discovered Instapaper, which allows me to save plain-text versions of online content such as blog posts on my iPod Touch, for later reading when I don’t have an Internet connection. It works with other apps on the iPod (Twitterrific, Safari, and so on) to send content to a “read later” file, and you can also use a bookmark in your desktop computer’s browser to capture stuff that you come across at work and don’t have time to read.

Free or pay, paper or electronic — there are now so many sources of help and information that there is no excuse for not trying to get a little better at what you do.

Create a free website or blog at WordPress.com.