CoolData blog

17 February 2011

Warning: This data is different

Filed under: Best practices, Pitfalls — Tags: , — kevinmacdonell @ 2:09 pm

This post is named for a conference keynote I will give this spring for senior managers working in advancement services. These people are no strangers to data, in fact their working lives revolve around data. But they don’t necessarily see data through the same lens as we do, and don’t value the same things as we do.

We’d better learn to understand their perspective, because “our” data is at their mercy. I’m talking about gift processing, alumni records, IT and computing services, database admins — people who can be our best friends, or bring data disasters down on our heads. Oh, and there are disasters!

To illustrate, I will draw a distinction between “everyday” data — processing gifts, updating constituent records, maintaining databases, and pulling reports — and predictive modeling data. The differences might seem a bit philosophical, but they’re real and have real consequences.

Everyday data is used for sense-making and explaining in the present, via reporting and descriptive statistics. (“What were Decembers pledge totals, and how do they compare with this time last year?”) Modeling data is not reporting or explaining anything — so it’s hard for some people to put a value on it. Everyday data might be doing important things such as hunting for causes (“Did pushing the income tax deadline email on Dec 31 boost giving?”). But not modeling data, which only seeks to uncover associations between things without trying to determine causation. (“Is there a connection between giving in December and being a significant donor?”). In short, everyday data work pays off in the short term; modeling data work pays off over a much longer period of time.

When everyday data is messy, it will probably be dismissed as invalid. When modeling data is messy, that’s considered normal, and there are techniques to address it. For everyday data, missing values are an issue; for modeling data, missing values can be useful, (i.e., predictive). When missing data is troublesome rather than predictive, we are free to make up data to fill the gaps, using imputation. This is a foreign concept to people who deal exclusively with everyday data.

In the everyday, we are picky: “Give me these records, but not those, and include this field, and this field, but not those fields.” For modeling, we say, “Give me everything — I want it all!” Everyday data seeks an answer, a single-point destination reached by one route. Modeling data has a destination too, but it gets there via a myriad of routes. Every potential predictor is a new route to explore. And we don’t know in advance what routes will get us there fastest; we have to drive them all.

And finally, one key difference in philosophy which can spell disaster for your institution: In the everyday, the most current data supersedes and replaces old data. Think of address information: Of what use is a mailing list to the Alumni Office if it’s full of addresses from the 1970s? Well, in modeling, that old data is just as valuable as fresh data. For example, I’ve found that the count of address updates an individual has is highly predictive of giving. The only way I can get that count is if I total up the number of deactivated records, and then add the current, active record. No historical records, no predictor.

Yes, some institutions routinely overwrite or just flat-out delete this stuff. But that is not the half of it. Because I wasn’t sure this sort of thing really happened, I started asking around. I received a raft of data disaster stories from all sorts of organizations, from non-profits to universities. I’ve collected so many tales of horror that I’m going to share them with you in a separate post next week.

(By the way, plug plug: the conference I’ll be speaking at is CASE’s Institute for Senior Advancement Services Professionals in Baltimore, April 27-29.)


1 Comment »

  1. […] data is viewed differently by data miners and the good people who work in Advancement Services. (Warning: This data is different.) These differences can lead to misunderstandings, and much worse. When data is treated as […]

    Pingback by Data disasters, courtesy of Mordac « CoolData blog — 22 February 2011 @ 6:29 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at

%d bloggers like this: