CoolData blog

5 May 2015

Predictive modelling for the nonprofit organization

Filed under: Non-university settings, Why predictive modeling? — Tags: , , — kevinmacdonell @ 6:15 pm

 

Predictive modelling enables an organization to focus its limited resources of time and money where they will earn the best return, using data. People who work at nonprofits can probably relate to the “limited resources” part of that statement. But is it a given that predictive analytics is possible or necessary for any organization?

 

This week, I’m in Kingston, Ontario to speak at the conference of the Association of Fundraising Professionals, Southeastern Ontario Chapter (AFP SEO). As usual I will be talking about how fundraisers can use data. Given the range of organizations represented at this conference, I’m considering questions that a small nonprofit might need to answer before jumping in. They boil down to two concerns, “when” and “what”:

 

When is the tipping point at which it makes sense to employ predictive modelling? And how is that tipping point defined — dollars raised, number of donors, size of database, or what?

 

What kind of data do we need to collect in order to do predictive modelling? How much should we be willing to spend to gather that data? What type of model should we build?

 

These sound like fundamental questions, yet I’ve rarely had to consider them. In higher education advancement, the questions are answered already.

 

In the first case, most universities are already over the tipping point. Even relatively small institutions have more non-donor alumni than they can solicit all at once via mail and phone — it’s just too expensive and it takes too much time. Prioritization is always necessary. Not all universities are using predictive modelling, but all could certainly benefit from doing so.

 

Regarding the second question — what data to collect — alumni databases are typically rich in the types of data useful for gauging affinity and propensity to give. Knowing everyone’s age is a huge advantage, for example. Even if the Advancement office doesn’t have ages for everyone, at least they have class year, which is usually a good proxy for age. Universities don’t always do a great job of tracking key engagement factors (event attendance, volunteering, and so on), but I’ve been fortunate in being able to have enough of this already-existing data with which to build robust models.

 

The situation is different for nonprofits, including small organizations that may not have real databases. (That situation was the topic I wrote about in my previous post: When does a small nonprofit need a database?) One can’t simply assume that predictive modelling is worth the trouble, nor can one assume that the data is available or worth investing in.

 

Fortunately the first question isn’t hard to answer, and I’ve already hinted at it. The tipping point occurs when the size of your constituency is so large that you cannot afford to reach out to all of them simultaneously. Your constituency may consist of any combination of past donors, volunteers, clients of your services, ticket buyers and subscribers, event attendees — anyone who has a reason to be in your database due to some connection with your organization.

 

Here’s an extreme example from the non-alumni charity world. Last year’s ALS Ice-Bucket Challenge already seems like a long time ago (which is the way of any social media-driven frenzy), but the real challenge is now squarely on the shoulders of ALS charities. Their constituency has grown by millions of new donors, but there is no guarantee that this windfall will translate into an elevated level of donor support in the long run. It’s a massive donor-retention problem: Most new donors will not give again, but retaining even a fraction could lead to a sizeable echo of giving. It always makes sense to ask recent donors to give again, but I think it would be incredibly wasteful to attempt reaching out to 2.5 million one-time donors. The organization needs to reach out to the right donors. I have no special insight into what ALS charities are doing, but this scenario screams “predictive modelling” to me. (I’ve written about it here: Your nonprofit’s real ice bucket challenge.)

 

None of us can relate to the ice-bucket thing, because it’s almost unique, but smaller versions of this dilemma abound. Let’s say your theatre company has a database with 20,000 records in it — people who have purchased subscriptions over the years, plus single-ticket buyers, plus all your donors (current and long-lapsed). You plan to run a two-week phone campaign for donations, but there’s no way you can reach everyone with a phone number in that limited time. You need a way to rank your constituents by likelihood to give, in order to maximize your return.

 

(About five years ago, I built a model using data from a symphony orchestra’s database. Among other things, I found that certain combinations of concert series subscriptions were associated with higher levels of giving. So: you don’t need a university alumni database to do this work!)

 

It works with smaller numbers, too. Let’s say your college has 1,000 alumni living in Toronto, and you want to invite them all to an event. Your budget allows a mail piece to be sent to just 250, however. If you have a predictive model for likelihood to attend an event, you can send mail to only the best prospective attendees, and perhaps email the rest.

 

In a reverse scenario, if your charity has 500 donors and you’re fully capable of contacting and visiting them all as often as you like, then there’s no business need for predictive modelling. I would also note that modelling is harder to do with small data sets, entailing  problems such as overfitting. But that’s a technical issue; it’s enough to know that modelling is something to consider only at the point when resources won’t cover the need to engage with your whole constituency.

 

Now for the second question: What data do you need?

 

My first suggestion is that you look to the data you already have. Going back to the example of the symphony orchestra: The data I used actually came from two different systems — one for donor management, the other for ticketing and concert series subscriptions. The key was that donors and concert attendees were each identified with a unique ID that spanned both databases. This allowed me to discover that people who favoured the great Classical composers were better donors than those who liked the “pops” concerts — but that people who attended both were the best donors of all! If the orchestra intended to identify a pool of prospects for leadership gifts, this would be one piece of the ranking score that would help them do it.

 

So: Explore your existing data. And while you’re doing so, don’t assume that messy, old, or incomplete data is not useable. It’s usually worth a look.

 

What about collecting new data? This can be an expensive proposition, and I think it would be risky to gather data just so you can build predictive models. There is no guarantee that what you’re spending time and money to gather is actually correlated with giving or other behaviours. My suggestion would be to gather data that serves operational purposes as well as analytical ones. A good example might be event attendance. If your organization holds a lot of events, you’ll want to keep statistics on attendance and how effective each event was. If you can find ways to record which individuals were at the event (donors, volunteers, community members), you will get this information, plus you will get a valuable input for your models.

 

Surveying is another way organizations can collect useful data for analysis while also serving other purposes. It’s one way to find out how old donors are — a key piece of information. Just be sure that your surveys are not anonymous! In my experience, people are not turned off by non-anonymous surveys so long as you’re not asking deeply personal questions. Offering a chance to win a prize for completing the survey can help.

 

Data you might gather on individuals falls into two general categories: Behaviours and attributes.

 

Behaviours are any type of action people take that might indicate affinity with your organization. Giving is obviously the big one, but other good examples would be event attendance or volunteering, or any type of interaction with your organization.

 

Attributes are just characteristics that prospects happen to have. This includes gender, where a person lives, age, wealth information, and so on.

 

Of the two types, behavioural factors are always the more powerful. You can never go wrong by looking at what people actually do. As the saying has it, people give of their time, talent, and treasure. Focus on those interactions first.

 

People also give of something else that is increasingly valuable: Their attention. If your organization makes use of a broadcast email platform, find out if it tracks opens and click-throughs — not just at the aggregate level, but at the individual level. Some platforms even assign a score to each email address that indicates the level of engagement with your emails. If you run phone campaigns, keep track of who answers the call. The world is so full of distractions, these periods of time when you have someone’s full attention are themselves gifts — and they are directly associated with likelihood to give financially.

 

Attributes are trickier. They can lead you astray with correlations that look real, but aren’t. Age is always a good thing to have, but gender is only sometimes useful. And I would never purchase external data (census and demographic data, for example) for predictive modelling alone. Aggregate data at the ZIP or postal code level is useful for a lot of things, but is not the strongest candidate for a model input. The correlations with giving to your organization will be weak, especially in comparison with the behavioural data you have on individuals.

 

What type of model does it make sense for a nonprofit to try to build first? Any modelling project starts with a clear statement of the business need. Perhaps you want to identify which ticket buyers will convert to donors, or which long-lapsed donors are most likely to respond positively to a phone call, or who among your past clients is most likely to be interested in becoming a volunteer.

 

Whatever it is, the key thing is that you have plenty of historical examples of the behaviour you want to predict. You want to have a big, fat target to aim for. If you want to predict likelihood to attend an event and your database contains 30,000 addressable records, you can be quite successful if 1,000 of those records have some history of attending events — but your model will be a flop if you’ve only got 50. The reason is that you’re trying to identify the behaviours and characteristics that typify the “event attendee,” and then go looking in your “non-attendee” group for those people who share those behaviours and characteristics. The better they fit the profile, the more likely they are to respond to an event invitation. Fifty people is probably not enough to define what is “typical.”

 

So for your first foray into modelling, I would avoid trying to hit very small targets. Major giving and planned giving propensity tend to fall into that category. I know why people choose to start there — because it implies high return on investment — but you would be wise to resist.

 

At this point, someone who’s done some reading may start to obsess about which highly advanced technique to use. But if you’re new to hands-on work, I strongly suggest using a simple method that requires you to study each variable individually, in relation to the outcome you’re trying to model. The best beginning point is to get familiar with comparing groups (attendees vs. non-attendees, donors vs. non-donors, etc.) using means and medians, preferably with the aid of a stats software package. (Peter Wylie’s book, Data Mining for Fundraisers has this covered.) From there, learn a bit more about exploring associations and correlations between variables by looking at scatterplots and using Pearson Product-Moment Correlation. That will set you up well for learning to do multiple linear regression, if you choose to take it that far.

 

In sum: Predictive modeling isn’t for everyone, but you don’t need Big Data or a degree in statistics to get some benefit from it. Start small, and build from there.

 

3 May 2015

When does a small nonprofit need a database?

Filed under: Non-university settings — Tags: , , , — kevinmacdonell @ 9:25 am

 

I had a dream a few nights ago in which I was telling my wife about a job interview I’d just had. A small rural Anglican church serving British expats was hiring a head of fund development. (I have very specific dreams.) I lamented that I had forgotten to ask some key questions: “I don’t even know if they have a database!”

 

Not all of my dreams are that nerdy. The fact is, nonprofit organizations (as opposed to higher education institutions — my usual concern) are on my mind lately, as I am preparing a conference presentation for a group that includes the full range of organizations, many of them small. I’m presenting on predictive modelling, but like that rural church, some organizations may not yet have a proper database.

 

When should an organization acquire some kind of database system or CRM?

 

Any organization, no matter how small, has to track activity and record information for operational purposes. This may be especially true for nonprofits that need to report on the impact they’re having in the community. I usually think in terms of tracking donors, but nonprofits may have an additional need to track clients and services.

 

Alas, the go-to is often the everyday Excel spreadsheet. It’s clear way: Excel is flexible, adaptable, comprehensible, and ubiquitous. Plus, if you’re a whiz, there are advanced features to explore. But while an Excel file can store data, it is NOT a true database. For a growing nonprofit, managing everything in spreadsheets will become an expensive liability. You may have already achieved a painful awareness of that fact. For others who aren’t there yet, here are a few warning signs that spreadsheets have outstayed their welcome in your office.

 

One: Even on a wide screen at 80% zoom, you have to do a lot of horizontal scrolling.

 

At the start, a spreadsheet seems so straightforward … A column each for First Name, Last Name, and some more columns for address information, phone and email. Then one day, you have a client or donor who has a second address — a business or seasonal address — and she wants to get your newsletter at one or the other, depending on the time of year. Both addresses are valid, so you need to add more columns. Hmm, and of course you want to track who attended your last event. If someone attends an event in July and another in December, you’ll need a column to record each event. As each volunteer has a new activity, as each client has a new interaction with your services, you are adding more and more columns until the sideways scrolling gets ridiculous.

 

Two: Your spreadsheet has so many rows that it is unwieldy to find or update individual records.

 

It’s technically true that an Excel file can store a million rows, but you probably wouldn’t want to open such a file on your computer. Files with just a few thousand rows can cause trouble after they’ve been worked over long enough. You can always tell a spreadsheet that’s been used to store data in the place of a true database, especially if more than one person has been mucking around in it. It’s in rough shape. In particular, errors made while sorting rows can lead to lost data and headaches all round.

 

Three: Several spreadsheets are being maintained separately, tracking different types of data on the same people.

 

Given the issues with large files, you’ll soon be tempted to have a separate sheet for each type of data. If you have a number of people on staff, each might be independently tracking the information that is relevant to their own work: One person tracking donors, another volunteers, another event attendees. John Doe might exist as a row in one or more of these separate files. If each file contains contact information, every change of address becomes a big deal, as it has to be applied in multiple places. Inevitably, the files get out of sync. As bad or worse, insights are not being shared across data files. Reporting is cumbersome, and anything like predictive modelling is impossible.

 

If this sounds like your situation, know that you’re not alone. I would be lying if I said rampant Excel use doesn’t occur in the (often) better-resourced world of higher education. Of course it does. Sometimes people don’t have the kind of access to the data they need, sometimes the database doesn’t have a module tailored to their business requirements, and sometimes people can’t be bothered to care about institution-wide data integrity. Shadow databases are a real problem on large campuses, and some of those orphan data stores are in Excel.

 

There’s nothing magic about a true database. It’s all about structure. A database stores data in tables, behind the scenes, and each table is very similar to a spreadsheet: it’s rectangular, and made up of rows and columns. The difference is that a single table usually holds only one type of data: Addresses, for example, or gift transactions. A table may be very long, with millions of rows, but it is typically not very wide, because each table serves only one purpose. As a consequence, a database has to have many tables, one for each thing needing to be stored. A complex enterprise database could have thousands of tables.

 

This sounds like chaos, but every record in a table contains a reference to data in another table. Tables are joined together by these identifiers, or keys. This allows a query of the database to retrieve John Smith from the ‘names’ table, the proper address for John Smith from the ‘addresses’ table, a sum of gifts made by John Smith from the ‘gifts’ table, and a volunteer status code for John Smith from the ‘volunteers’ table. When John Smith moves and provides his new address, that information is added as a new record in the ‘addresses’ table, attached to his unique identifier (i.e., his ID number). The old address is not deleted, but is marked ‘invalid’, so that the information is retained but never appears on a list of valid addresses. One place, one change — and it’s done.

 

That’s a quick and rather inadequate description of what a database is and does. There’s more to a donor management system than just a table structure, and I could say plenty more about user interfaces, reporting, and data integrity and security. But there is no shortage of information and guidance online, so I will leave you with a few places to go for good advice. There are many software solutions out there for organizations big and small.

 

Robert L. Weiner is a nonprofit technology consultant, helping fundraisers choose software tools. Check out his Ten Common Mistakes in Selecting Donor Databases (And How to Avoid Them). As you proceed toward acquiring a system, here is a piece published by AFP that has good, basic advice about how to manage it: Overcoming Database Demons.

 

Andrew Urban is author of a great book that helps guide nonprofits large and small in making wise choices in software and systems investments: The Nonprofit Buyer: Strategies for Success from a Nonprofit Technology Sales Veteran.

 

That’s all from me on this … CoolData’s domain is not systems or databases, but the data itself. A good system is simply a basic requirement for analysis. In my next post, I will address another question a small nonprofit might have: At one point is a nonprofit “big” enough to be able to get benefit from doing predictive modelling?

 

1 February 2012

Where’s your institution on the Culture of Analytics Ladder?

Filed under: Fun, predictive analytics, Why predictive modeling? — Tags: , , — kevinmacdonell @ 2:21 pm

I’m laying on the couch with a bad head cold, and there’s a mix of snow and rain in the forecast. Time to curl up with my laptop and a cup of tea. I’ve got a question for you!

Not long ago I asked you to give me examples of institutions you’re aware of that are shining examples of institution-wide data-driven decision making. I was grateful for the responses, but no single institution was named twice. A few people offered an opinion about how their own organizations size up, which I found interesting.

So let’s explore that a bit more with a quick and anonymous poll: Where do you think your non-profit organization or institution fits on the Culture of Analytics Ladder? (That’s CoAL for short … but no need to be so formal. I totally made this up under the influence of cold medication.) Don’t over think it. Just pick whatever stage you feel your org or institution occupies.

The categories may seem a bit vague. If it’s any help, by “analysis” or “analytics” I am referring to the process of sifting through large quantities of data in search of patterns that lead to insights, primarily about your constituents. I am NOT referring to reporting. In fact I want you to ignore a lot of the day-to-day processes that involve data but are not really “analysis,” including: data entry, gift accounting, appeal segmentation, reporting on historical results, preparation of financials, and so on.

I am thinking more along the lines of modelling for the prediction of behaviours (which group of constituents are most likely to engage in such-and-so a behaviour?), prediction of future results (i.e., forecasting), open-ended exploration of constituent data in search of “clusters”, and and any other variety of data work that would be called on to make a decision about what to do in the future, as opposed to documenting what happened in the past. I am uncertain whether A/B split testing fits my definition of analysis, but let’s be generous and say that it does.

A couple of other pointers:

  • If you work for, say, a large university advancement department and aren’t sure whether analytics is used in other departments such as student admissions or recruitment, then answer just for your department. Same thing if you work for a regional office of a large non-profit and aren’t sure about the big picture.
  • If you have little or no in-house expertise, but occasionally hire a vendor to produce predictive modelling scores, then you might answer “6” — but only if those scores are actually being well used.

Here we go.

22 April 2010

Introducing the nonprofit data collective

Filed under: Non-university settings — Tags: , , — kevinmacdonell @ 8:20 am

Yesterday I gave a conference presentation to a group of fundraisers, all but one of whom work for non-university nonprofits. Many have databases that are small, do not capture the right kinds of information to develop a model, or are unfit in any number of ways. But this group seemed highly attentive to what I was talking about, understood the concepts, and a few were eager to improve the quality of their data – and from there get into data mining someday.

The questions were all spot-on. One person asked how many database records one needed as a minimum for predictive modeling. I don’t know if there’s a pat answer for that one, but in any case I think my answer was discouraging. If you’re below a certain size threshold, you may not have any need for modeling at all. But the fact is, if you want to model mass behaviour, you need a lot of data.

So here’s a thought. What if a bunch of small- to mid-sized charities were to somehow get together and agree to submit their data to a centralized database? Before you object, hear me out.

Each charity would fund part of the salary of a database administrator and data-entry person, according to the proportion of the donor base they occupy in the data pool. The first benefit is that data would be entered and stored according to strict quality-control guidelines. Any time a charity required an address file for a particular mailing according to certain selection criteria, they’d just ask for it. The charity could focus on the delivery of their mission, not mucking around with a database they don’t fully know how to use.

The next benefit is scale. The records of donors to charities with related missions can be pooled for the purpose of building stronger predictive models than any one charity could hope to do on its own. Certain costs, such as list acquisition, could be shared and the benefits apportioned out. Some cross-promotion between causes could also occur, if charities found that to have a net benefit.

Maybe charities would not choose to cede control of their data. Maybe there are donor privacy concerns that I’m overlooking. It’s just an idea. My knowledge of the nonprofit sector outside of universities is limited – does anyone know of an example of this idea in use today?

P.S. (18 Feb 2011): This post by Jim Berigan on the Step by Step Fundraising blog is a step in the right direction: 5 Reasons You Should Collaborate with Another Non-profit in 2011.

Blog at WordPress.com.