CoolData blog

5 July 2016

A simple score you can probably build in Excel

Filed under: Excel, Peter Wylie, Predictive scores — Tags: , , , — kevinmacdonell @ 4:22 pm

Guest post by Peter B. Wylie


In the evolving world of analysis for higher ed and non-profits, it’s apparent that a gap is widening: Many well-resourced shops are acquiring analytics talent comfortable with statistics and programming, but many others are unable to make investments in specialized talent.


Today’s guest post is a paper by Peter Wylie that addresses the latter group, the ones at risk of being left behind. Download his paper here: Simple_Score_in_Excel_Wylie


In this piece he uses data from two schools to show you something you can try with your own data, building a very simple predictive score using nothing but Excel.


Some level of data analysis ought to be accessible at some level to every organization, regardless of technical proficiency or tools. And in fact, shops that move too quickly to automate predictive scoring with black-box-like methods risk passing over the insights available to the exploratory analyst using more manual, time-consuming methods.


We hope you enjoy, and above all, that you try this with your own data. The download link again: Simple_Score_in_Excel_Wylie



17 November 2011

Add life to numbers with Excel colour scales

Filed under: Excel, Fun, Visualization — kevinmacdonell @ 7:21 am

For telling a story with data nothing beats a good visualization, which for most of us means a chart, usually created in Excel. Sometimes the situation seems to call for including the data in table format as well. You may include it after the chart, or throw it into an appendix; either way, you’re creating redundancy: The table and the chart are telling the same story.

The table offers accurate detail, but the person who’s trying to find the story in the data needs to study the table for a while to find it, even if there are only very few rows and columns. The chart is readily accessible (or should be), but probably trades away accurate detail to gain simplicity. Today I’ll show you that sometimes you can get the best of both in a single graphic.

Let’s start with a table that looks a bit complicated. This is copied from a study published on this blog by Peter Wylie and John Sammis (Is the online behaviour of your alums worth exploring?) For this example, it’s not essential that you understand what the chart is about. The important thing is to attempt to recognize patterns in the table. You can probably spot them, but only after studying the numbers and comparing values across cells, both up and down and from side to side. It’s not exactly “at a glance” comprehension.

The paper didn’t offer a chart for this table, but were we to express the data visually, we might have used something like the chart below. It’s a bit complex because of all the lines, but we can readily distinguish some key things about the Class Year deciles (that’s what “CY Decile” stands for). First of all, they’re arranged in order, from top to bottom, by age. The oldest decile starts high and stays up there, while the two youngest deciles languish at the bottom. All the rest are stacked in between. I haven’t told you what they are high IN, but let’s not worry about that for now.

Whether the chart is helpful or not, it’s still takes up extra room: One table, one chart, same data. What if we could COMBINE the two types of information — numerical and visual — into a single entity? Fortunately, newer versions of Excel offer the ability to include visual cues within the data table itself using conditional formatting. Here is the original table with colour added. Now what do you see?

That’s right, when cell are shaded according to the values they contain, the patterns are obvious. The lower the CY Decile value and the higher the Visits value, the greater is the number in the cell. Our eye is immediately drawn to the intersection of row ‘CY Decile 1’ and the column ‘8 or more visits’. And as it turns out, this is a very special cell.

How so? Well, this table was created using data from a university that tracks how many times its alumni have visited its website via links sent out in university emails — the columns break down alumni by number of web visits. As well, for this analysis, all alumni were divided up into ten equal groups (deciles) by Class Year, the oldest being CY Decile 1, and the youngest CY Decile 10.

The values in the cells are actually percentages — the percentage of alumni in each group who have Lifetime Giving of $10,000 or more. No one should be surprised that the oldest alumni have the highest percentages, but the truly awesome finding is that frequent web visits are also associated with high levels of lifetime giving. (Anyone who has this data at their institution and fails to make use of it for prospect identification is obviously not interested in raising scads of money. Consider firing them.)

As I said, our eye is immediately drawn to the darkest cell, the 42.9% value. It’s almost like that cell is the “hottest.” Which leads me to the next version of the chart: a “heat map”. Sometimes we want to highlight both extremes, the highest AND the lowest values. It’s very intuitive to think of high values as “hot” and low values as “cool,” and colour them appropriately, like so:

Adding shading is quick and easy. Start by clicking and dragging to select the cells of your table that hold the values you want to jazz up. On the Home tab, click on the Conditional Formatting icon, and select Color Scales. Here you’ve got all sorts of options, one of which is the cold-to-hot scheme. Play with it to your heart’s content: Some shadings might be more appropriate for your data than others.

I can see this being useful for at-a-glance exploration of large tables in which patterns may be difficult to detect. For presenting your data to others, though, adding colour scales makes sense only when the physical layout of the cell values contains a visual pattern that you want to exploit to tell a story. Otherwise it’s just visual junk. The object is not just to make our charts pretty, but to make the data sing.

29 January 2010

Friday tip: Stop Excel from messing with long text strings

Filed under: Excel — Tags: , , — kevinmacdonell @ 12:23 pm

Ever try pasting a long string of text into Excel only to have the cell truncate at 255 characters? I hate this.

The problem crops up when I paste the results of a query of our database out of MS Access and into Excel. The queries I’m working with this week are “last and next action” reports for major gift prospects, and sometimes the Notes column contains more text than Excel likes. The result is that my notes are cut off after a certain number of characters. I was familiar with this problem when working with the constraints of Excel 2003, but I was dismayed to encounter the same problem in Excel 2007. I Googled around in desperation, to no avail.

The solution I found by accident is stupidly simple. Maybe I’m the only person who didn’t already know how to deal with this? Anyway, here it is:

Once you’ve copied the columns from your source (Access or whatever), open Excel and right-click on cell A1. Choose Paste Special, then choose CSV. Click OK to paste.

This step will prevent Excel from truncating long text strings. I don’t know why this works, and I don’t care. It just works.

Note: This is for Excel 2007. I haven’t tested it for Excel 2003, and I haven’t tested with other Paste Special options.

4 January 2010

Complex formula for name concatenation in Excel

Filed under: Excel — Tags: , , , — kevinmacdonell @ 3:42 pm

One common and very time-consuming task I perform in Excel is creating one clean-looking column of names out of half a dozen columns of name data. It’s the nature of the beast: Every time we pull data on individuals, we’re going to have prefixes (Dr., Rev., etc.), nicknames, and first, middle and last names as separate bits of data. Stringing names together using simple concatenation is not difficult, but the absence of data in some of these columns (prefix, middle name, and nickname) makes the process hard to automate. If you just string the whole row together, you’ll end up with extra spaces and other anomalies.

Here’s a formula that uses nested conditional statements to account for empty cells:

=SUBSTITUTE(TRIM(A2 & " "& B2 & " " & C2 & " " & IF(LEN(B2)=1," ", IF(LEN(D2)=0,"",IF(B2=D2," ","("&D2&")"))) & " " & E2), " ", " ")

Yes, it’s scary, but it works.

(For the record, my nickname is NOT Kev.)

Your columns MUST be in this order to work:

A: Prefix
B: First name
C: Middle name
D: Preferred first name or Nickname
E: Last name

Here’s what the formula does:

  • Concatenates the whole name from whatever “parts” are available
  • Automatically removes extra spaces from the concatenation for cells that are empty (for example, if there’s no Prefix)
  • If the first name is a single initial (see note below), then the first part of the name is the initial plus the middle name
  • If the first name is the same as the preferred first name (nickname), then column D is skipped
  • If the first name is different from the preferred first name (if it is present), then the value of column D is expressed in parentheses

NOTE: By data-entry convention, our Banner records express single initials without a period, therefore the test for a single initial is “length = 1”. If your data includes an initial, you’ll have to modify this to LEN(B2) = 2, which might cause a problem with names that are two letters in length.

Your first column might not be Prefix. In fact it probably isn’t: “ID” is probably your first column. In that case, it might be easier to paste the formula into a simple text editor and shift all the letters however many places to the right that you need, before using it in Excel, like so:

=SUBSTITUTE(TRIM(B2 & " "& C2 & " " & D2 & " " & IF(LEN(C2)=1," ", IF(LEN(E2)=0,"",IF(C2=E2," ","("&E2&")"))) & " " & F2), " ", " ")

This formula, and others like it, would be especially useful to save as a sort of template, if you frequently pull data that has the same columns. This idea was suggested to me by Sarah Ellison, Development Research Associate at the Curtis Institute of Music:

As long as you can get your data into an Excel spreadsheet with the same columns in the same order (she uses Raiser’s Edge, with saved queries and exports), you can take some of the hassle out of using Excel for reporting. She recommends that you have a tab/worksheet with all of your raw data (exactly the way it gets exported from the database), and another tab/worksheet with the formulas to make your page look the way you want it. Just ensure that all your cell references refer to cells in the other tab, and don’t overwrite the formulas sheet when you update/save over the page with the raw data.

Blog at