CoolData blog

5 July 2016

A simple score you can probably build in Excel

Filed under: Excel, Peter Wylie, Predictive scores — Tags: , , , — kevinmacdonell @ 4:22 pm

Guest post by Peter B. Wylie


In the evolving world of analysis for higher ed and non-profits, it’s apparent that a gap is widening: Many well-resourced shops are acquiring analytics talent comfortable with statistics and programming, but many others are unable to make investments in specialized talent.


Today’s guest post is a paper by Peter Wylie that addresses the latter group, the ones at risk of being left behind. Download his paper here: Simple_Score_in_Excel_Wylie


In this piece he uses data from two schools to show you something you can try with your own data, building a very simple predictive score using nothing but Excel.


Some level of data analysis ought to be accessible at some level to every organization, regardless of technical proficiency or tools. And in fact, shops that move too quickly to automate predictive scoring with black-box-like methods risk passing over the insights available to the exploratory analyst using more manual, time-consuming methods.


We hope you enjoy, and above all, that you try this with your own data. The download link again: Simple_Score_in_Excel_Wylie


3 May 2015

When does a small nonprofit need a database?

Filed under: Non-university settings — Tags: , , , — kevinmacdonell @ 9:25 am


I had a dream a few nights ago in which I was telling my wife about a job interview I’d just had. A small rural Anglican church serving British expats was hiring a head of fund development. (I have very specific dreams.) I lamented that I had forgotten to ask some key questions: “I don’t even know if they have a database!”


Not all of my dreams are that nerdy. The fact is, nonprofit organizations (as opposed to higher education institutions — my usual concern) are on my mind lately, as I am preparing a conference presentation for a group that includes the full range of organizations, many of them small. I’m presenting on predictive modelling, but like that rural church, some organizations may not yet have a proper database.


When should an organization acquire some kind of database system or CRM?


Any organization, no matter how small, has to track activity and record information for operational purposes. This may be especially true for nonprofits that need to report on the impact they’re having in the community. I usually think in terms of tracking donors, but nonprofits may have an additional need to track clients and services.


Alas, the go-to is often the everyday Excel spreadsheet. It’s clear way: Excel is flexible, adaptable, comprehensible, and ubiquitous. Plus, if you’re a whiz, there are advanced features to explore. But while an Excel file can store data, it is NOT a true database. For a growing nonprofit, managing everything in spreadsheets will become an expensive liability. You may have already achieved a painful awareness of that fact. For others who aren’t there yet, here are a few warning signs that spreadsheets have outstayed their welcome in your office.


One: Even on a wide screen at 80% zoom, you have to do a lot of horizontal scrolling.


At the start, a spreadsheet seems so straightforward … A column each for First Name, Last Name, and some more columns for address information, phone and email. Then one day, you have a client or donor who has a second address — a business or seasonal address — and she wants to get your newsletter at one or the other, depending on the time of year. Both addresses are valid, so you need to add more columns. Hmm, and of course you want to track who attended your last event. If someone attends an event in July and another in December, you’ll need a column to record each event. As each volunteer has a new activity, as each client has a new interaction with your services, you are adding more and more columns until the sideways scrolling gets ridiculous.


Two: Your spreadsheet has so many rows that it is unwieldy to find or update individual records.


It’s technically true that an Excel file can store a million rows, but you probably wouldn’t want to open such a file on your computer. Files with just a few thousand rows can cause trouble after they’ve been worked over long enough. You can always tell a spreadsheet that’s been used to store data in the place of a true database, especially if more than one person has been mucking around in it. It’s in rough shape. In particular, errors made while sorting rows can lead to lost data and headaches all round.


Three: Several spreadsheets are being maintained separately, tracking different types of data on the same people.


Given the issues with large files, you’ll soon be tempted to have a separate sheet for each type of data. If you have a number of people on staff, each might be independently tracking the information that is relevant to their own work: One person tracking donors, another volunteers, another event attendees. John Doe might exist as a row in one or more of these separate files. If each file contains contact information, every change of address becomes a big deal, as it has to be applied in multiple places. Inevitably, the files get out of sync. As bad or worse, insights are not being shared across data files. Reporting is cumbersome, and anything like predictive modelling is impossible.


If this sounds like your situation, know that you’re not alone. I would be lying if I said rampant Excel use doesn’t occur in the (often) better-resourced world of higher education. Of course it does. Sometimes people don’t have the kind of access to the data they need, sometimes the database doesn’t have a module tailored to their business requirements, and sometimes people can’t be bothered to care about institution-wide data integrity. Shadow databases are a real problem on large campuses, and some of those orphan data stores are in Excel.


There’s nothing magic about a true database. It’s all about structure. A database stores data in tables, behind the scenes, and each table is very similar to a spreadsheet: it’s rectangular, and made up of rows and columns. The difference is that a single table usually holds only one type of data: Addresses, for example, or gift transactions. A table may be very long, with millions of rows, but it is typically not very wide, because each table serves only one purpose. As a consequence, a database has to have many tables, one for each thing needing to be stored. A complex enterprise database could have thousands of tables.


This sounds like chaos, but every record in a table contains a reference to data in another table. Tables are joined together by these identifiers, or keys. This allows a query of the database to retrieve John Smith from the ‘names’ table, the proper address for John Smith from the ‘addresses’ table, a sum of gifts made by John Smith from the ‘gifts’ table, and a volunteer status code for John Smith from the ‘volunteers’ table. When John Smith moves and provides his new address, that information is added as a new record in the ‘addresses’ table, attached to his unique identifier (i.e., his ID number). The old address is not deleted, but is marked ‘invalid’, so that the information is retained but never appears on a list of valid addresses. One place, one change — and it’s done.


That’s a quick and rather inadequate description of what a database is and does. There’s more to a donor management system than just a table structure, and I could say plenty more about user interfaces, reporting, and data integrity and security. But there is no shortage of information and guidance online, so I will leave you with a few places to go for good advice. There are many software solutions out there for organizations big and small.


Robert L. Weiner is a nonprofit technology consultant, helping fundraisers choose software tools. Check out his Ten Common Mistakes in Selecting Donor Databases (And How to Avoid Them). As you proceed toward acquiring a system, here is a piece published by AFP that has good, basic advice about how to manage it: Overcoming Database Demons.


Andrew Urban is author of a great book that helps guide nonprofits large and small in making wise choices in software and systems investments: The Nonprofit Buyer: Strategies for Success from a Nonprofit Technology Sales Veteran.


That’s all from me on this … CoolData’s domain is not systems or databases, but the data itself. A good system is simply a basic requirement for analysis. In my next post, I will address another question a small nonprofit might have: At one point is a nonprofit “big” enough to be able to get benefit from doing predictive modelling?


29 January 2010

Friday tip: Stop Excel from messing with long text strings

Filed under: Excel — Tags: , , — kevinmacdonell @ 12:23 pm

Ever try pasting a long string of text into Excel only to have the cell truncate at 255 characters? I hate this.

The problem crops up when I paste the results of a query of our database out of MS Access and into Excel. The queries I’m working with this week are “last and next action” reports for major gift prospects, and sometimes the Notes column contains more text than Excel likes. The result is that my notes are cut off after a certain number of characters. I was familiar with this problem when working with the constraints of Excel 2003, but I was dismayed to encounter the same problem in Excel 2007. I Googled around in desperation, to no avail.

The solution I found by accident is stupidly simple. Maybe I’m the only person who didn’t already know how to deal with this? Anyway, here it is:

Once you’ve copied the columns from your source (Access or whatever), open Excel and right-click on cell A1. Choose Paste Special, then choose CSV. Click OK to paste.

This step will prevent Excel from truncating long text strings. I don’t know why this works, and I don’t care. It just works.

Note: This is for Excel 2007. I haven’t tested it for Excel 2003, and I haven’t tested with other Paste Special options.

4 January 2010

Complex formula for name concatenation in Excel

Filed under: Excel — Tags: , , , — kevinmacdonell @ 3:42 pm

One common and very time-consuming task I perform in Excel is creating one clean-looking column of names out of half a dozen columns of name data. It’s the nature of the beast: Every time we pull data on individuals, we’re going to have prefixes (Dr., Rev., etc.), nicknames, and first, middle and last names as separate bits of data. Stringing names together using simple concatenation is not difficult, but the absence of data in some of these columns (prefix, middle name, and nickname) makes the process hard to automate. If you just string the whole row together, you’ll end up with extra spaces and other anomalies.

Here’s a formula that uses nested conditional statements to account for empty cells:

=SUBSTITUTE(TRIM(A2 & " "& B2 & " " & C2 & " " & IF(LEN(B2)=1," ", IF(LEN(D2)=0,"",IF(B2=D2," ","("&D2&")"))) & " " & E2), " ", " ")

Yes, it’s scary, but it works.

(For the record, my nickname is NOT Kev.)

Your columns MUST be in this order to work:

A: Prefix
B: First name
C: Middle name
D: Preferred first name or Nickname
E: Last name

Here’s what the formula does:

  • Concatenates the whole name from whatever “parts” are available
  • Automatically removes extra spaces from the concatenation for cells that are empty (for example, if there’s no Prefix)
  • If the first name is a single initial (see note below), then the first part of the name is the initial plus the middle name
  • If the first name is the same as the preferred first name (nickname), then column D is skipped
  • If the first name is different from the preferred first name (if it is present), then the value of column D is expressed in parentheses

NOTE: By data-entry convention, our Banner records express single initials without a period, therefore the test for a single initial is “length = 1”. If your data includes an initial, you’ll have to modify this to LEN(B2) = 2, which might cause a problem with names that are two letters in length.

Your first column might not be Prefix. In fact it probably isn’t: “ID” is probably your first column. In that case, it might be easier to paste the formula into a simple text editor and shift all the letters however many places to the right that you need, before using it in Excel, like so:

=SUBSTITUTE(TRIM(B2 & " "& C2 & " " & D2 & " " & IF(LEN(C2)=1," ", IF(LEN(E2)=0,"",IF(C2=E2," ","("&E2&")"))) & " " & F2), " ", " ")

This formula, and others like it, would be especially useful to save as a sort of template, if you frequently pull data that has the same columns. This idea was suggested to me by Sarah Ellison, Development Research Associate at the Curtis Institute of Music:

As long as you can get your data into an Excel spreadsheet with the same columns in the same order (she uses Raiser’s Edge, with saved queries and exports), you can take some of the hassle out of using Excel for reporting. She recommends that you have a tab/worksheet with all of your raw data (exactly the way it gets exported from the database), and another tab/worksheet with the formulas to make your page look the way you want it. Just ensure that all your cell references refer to cells in the other tab, and don’t overwrite the formulas sheet when you update/save over the page with the raw data.

Create a free website or blog at