CoolData blog

17 November 2011

Add life to numbers with Excel colour scales

Filed under: Excel, Fun, Visualization — kevinmacdonell @ 7:21 am

For telling a story with data nothing beats a good visualization, which for most of us means a chart, usually created in Excel. Sometimes the situation seems to call for including the data in table format as well. You may include it after the chart, or throw it into an appendix; either way, you’re creating redundancy: The table and the chart are telling the same story.

The table offers accurate detail, but the person who’s trying to find the story in the data needs to study the table for a while to find it, even if there are only very few rows and columns. The chart is readily accessible (or should be), but probably trades away accurate detail to gain simplicity. Today I’ll show you that sometimes you can get the best of both in a single graphic.

Let’s start with a table that looks a bit complicated. This is copied from a study published on this blog by Peter Wylie and John Sammis (Is the online behaviour of your alums worth exploring?) For this example, it’s not essential that you understand what the chart is about. The important thing is to attempt to recognize patterns in the table. You can probably spot them, but only after studying the numbers and comparing values across cells, both up and down and from side to side. It’s not exactly “at a glance” comprehension.

The paper didn’t offer a chart for this table, but were we to express the data visually, we might have used something like the chart below. It’s a bit complex because of all the lines, but we can readily distinguish some key things about the Class Year deciles (that’s what “CY Decile” stands for). First of all, they’re arranged in order, from top to bottom, by age. The oldest decile starts high and stays up there, while the two youngest deciles languish at the bottom. All the rest are stacked in between. I haven’t told you what they are high IN, but let’s not worry about that for now.

Whether the chart is helpful or not, it’s still takes up extra room: One table, one chart, same data. What if we could COMBINE the two types of information — numerical and visual — into a single entity? Fortunately, newer versions of Excel offer the ability to include visual cues within the data table itself using conditional formatting. Here is the original table with colour added. Now what do you see?

That’s right, when cell are shaded according to the values they contain, the patterns are obvious. The lower the CY Decile value and the higher the Visits value, the greater is the number in the cell. Our eye is immediately drawn to the intersection of row ‘CY Decile 1’ and the column ‘8 or more visits’. And as it turns out, this is a very special cell.

How so? Well, this table was created using data from a university that tracks how many times its alumni have visited its website via links sent out in university emails — the columns break down alumni by number of web visits. As well, for this analysis, all alumni were divided up into ten equal groups (deciles) by Class Year, the oldest being CY Decile 1, and the youngest CY Decile 10.

The values in the cells are actually percentages — the percentage of alumni in each group who have Lifetime Giving of $10,000 or more. No one should be surprised that the oldest alumni have the highest percentages, but the truly awesome finding is that frequent web visits are also associated with high levels of lifetime giving. (Anyone who has this data at their institution and fails to make use of it for prospect identification is obviously not interested in raising scads of money. Consider firing them.)

As I said, our eye is immediately drawn to the darkest cell, the 42.9% value. It’s almost like that cell is the “hottest.” Which leads me to the next version of the chart: a “heat map”. Sometimes we want to highlight both extremes, the highest AND the lowest values. It’s very intuitive to think of high values as “hot” and low values as “cool,” and colour them appropriately, like so:

Adding shading is quick and easy. Start by clicking and dragging to select the cells of your table that hold the values you want to jazz up.¬†On the Home tab, click on the Conditional Formatting icon, and select Color Scales. Here you’ve got all sorts of options, one of which is the cold-to-hot scheme. Play with it to your heart’s content: Some shadings might be more appropriate for your data than others.

I can see this being useful for at-a-glance exploration of large tables in which patterns may be difficult to detect. For presenting your data to others, though, adding colour scales makes sense only when the physical layout of the cell values contains a visual pattern that you want to exploit to tell a story. Otherwise it’s just visual junk. The object is not just to make our charts pretty, but to make the data sing.

8 September 2011

Analytics goes to the movies

Filed under: Fun — Tags: , , — kevinmacdonell @ 5:25 am

I’m not a big baseball fan, but I am an analytics fan, so I’m curious to see how the concept is portrayed in the movie Moneyball, set to hit theatres on Sept. 23. The movie is based on the book of the same name by Michael Lewis. It tells the story of Billy Beane (played by Brad Pitt), the general manager of the Oakland A’s in the late 1990s, who needed to field a winning team on a meagre payroll.

You might already know about this analytics success story. Beane and his assistant general manager Paul DePodesta had an idea: There had to be champion players out there whom everyone else had overlooked; if no one was looking for them, they would be affordable. Their prospecting tool would be statistical analysis.

For example, their analysis showed that new stats such as on-base percentage and slugging percentage were better predictors of offensive success than the tried-and-true qualities valued by traditional buyers of talent. The analytical approach paid off, and it was soon copied by other major league baseball teams.

I don’t know if the movie will have much to do with analytics, but the trailer (click here to watch on YouTube), does seem to portray a geeky practitioner trying sell crusty, tradition-bound decision makers on the value of his craft. I like that story.

Better than that, though, is a line spoken by the fictional character Peter Brand (based on DePodesta and played by Jonah Hill): “Your goal shouldn’t be to buy players,” Brand says to Beane. “Your goal should be to buy wins. In order to buy wins, you need to buy runs.”

It’s a statement about success with statistical modeling: You have to start by properly framing the question. The rest is just technique.

The analytical approach doesn’t pay off for professional sports like it used to, because now everyone does it. (The Oakland A’s are not doing very well these days.) But analytics will always pay off in the nonprofit world because in the hunt for potential donors we still mostly compete against ourselves — we either identify our friends and supporters effectively, or we don’t.

17 December 2010

Chart frequency of words and terms from books with Google

Filed under: Coolness, Free stuff, Fun — Tags: , — kevinmacdonell @ 11:46 am

Here’s another interesting bauble from the nerds at Google. The Books Ngram Viewer allows you to plot the frequency of words and phrases that appeared in books published in the past few hundred years. Google estimates they’ve scanned and OCR’d more than 10 percent of all the books ever published, and this plotter is based on a sample of that data.

This “most excellent time-wasting tool” was blogged about by Alexis Madrigal, a senior editor for TheAtlantic.com, in his post, The Decline of Man (as a Word), in which he shows how the word “man” has fared against “woman”. (Not well.) As Madrigal observes, this may not serve a legitimate research purpose, but it sure is fun.

Here’s a sample. I’ve searched for the term “database”, and set the years to search as 1950 to 2008. The y-axis shows the percentage of all the terms contained in Google’s sample of books written in English that are “database” for those years. As you can see, the word didn’t emerge in published sources before the early 1970s. (Click image for full size.)

The tool also allows you to plot the progress of one term against another. If you plot “database” against “data base”, you’ll see that the two-word term enjoyed a short life before the single word took over. I’ve been interested in the use of the word “gift” instead of “donation,” but the plot of those two words isn’t very informative due, I guess, to the many connotations of the word “gift.” Instead I plotted “charitable gift” and “charitable donation” to put the words in context, and came up with this chart. The concept of giving seems to have had quite a heyday up until around 1835, and “donation” was firmly in the lead. By 1880, though, it was all about the gift. (Click image for full size.)

That got me thinking about how well “philanthropy” has done through the years. Mentions before 1750 are rare, so I plotted from then to the present, and once again the first half the 19th century seems to have been relatively more preoccupied with the idea than later on. (Although, of course, who knows what data this is really based on. As I said, it’s fun, but I wouldn’t want to base a thesis on it without knowing more about the underlying data.)

Hmm – this IS fun. What if we plot poverty vs. religion vs. education? This doesn’t tell us what people were giving to, but it does give a glimpse into what they were writing about. “Poverty” has stayed relatively constant since 1750, but look at how “religion” has declined as “education” has risen. One line crosses the other right at 1909. Also interesting is that the trend started reversing direction about 10 years ago.

And finally, this chart plots “data mining” and two variations of “fundraising“. Data mining takes off as a published term in the early 1990s, and the term “fund raising” has merged into the single word, “fundraising.”

All sorts of fun. Try some for yourself! I’d be interested in hearing about any cool combos you come up with that relate to analytics and/or fundraising.

8 October 2010

An Analytical Odyssey

Filed under: Fun — kevinmacdonell @ 11:44 am

Something fun for Friday! Robert Scott, Executive Director of Development at the Massachusetts Institute of Technology, delivered the opening address for the APRA International Symposium on Data Analytics in Anaheim this past summer. He called it An Analytics Odyssey, and it made all kinds of references to Homer and The Odyssey, with the general theme being how far analytics has come, and where he sees it potentially going — like Odysseus on his eventful, ten-year journey from Troy home to Ithaca, where the faithful Penelope has in the meantime gained a tenured position at Cornell. (OK, I’m making that last part up.)

It was a wide-ranging talk, with some amusing asides thrown in, including this slide with some imagined highlights of the long, strange trip taken by the analytical Odysseus. (Click for full size.) I asked Robert if I could post it here, and he graciously sent it along.

« Newer Posts

Blog at WordPress.com.