CoolData blog

22 September 2014

What predictor variables should you avoid? Depends on who you ask

People who build predictive models will tell you that there are certain variables you should avoid using as predictors. I am one of those people. However, we disagree on WHICH variables one should avoid, and increasingly this conflicting advice is confusing those trying to learn predictive modeling.

The differences involve two points in particular. Assuming charitable giving is the behaviour we’re modelling for, those two things are:

  1. Whether we should use past giving to predict future giving, and
  2. Whether attributes such as marital status are really predictors of giving.

I will offer my opinions on both points. Note that they are opinions, not definitive answers.

1. Past giving as a predictor

I have always stressed that if you are trying to predict “giving” using a multiple linear regression model, you must avoid using “giving” as a predictor among your independent variables. That includes anything that is a proxy for “giving,” such as attendance at a donor-thanking event. This is how I’ve been taught and that is what I’ve adhered to in practice.

Examples that violate this practice keep popping up, however. I have an email from Atsuko Umeki, IT Coordinator in the Development Office of the University of Victoria in Victoria, British Columbia*. She poses this question about a post I wrote in July 2013:

“In this post you said, ‘In predictive models, giving and variables related to the activity of giving are usually excluded as variables (if ‘giving’ is what we are trying to predict). Using any aspect of the target variable as an input is bad practice in predictive modelling and is carefully avoided.’  However, in many articles and classes I read and took I was advised or instructed to include past giving history such as RFA*, Average gift, Past 3 or 5 year total giving, last gift etc. Theoretically I understand what you say because past giving is related to the target variable (giving likelihood); therefore, it will be biased. But in practice most practitioners include past giving as variables and especially RFA seems to be a good variable to include.”

(* RFA is a variation of the more familiar RFM score, based on giving history — Recency, Frequency, and Monetary value.)

So modellers-in-training are being told to go ahead and use ‘giving’ to predict ‘giving’, but that’s not all: Certain analytics vendors also routinely include variables based on past giving as predictors of future giving. Not long ago I sat in on a webinar hosted by a consultant, which referenced the work of one well-known analytics vendor (no need to name the vendor here) in which it seemed that giving behaviour was present on both sides of the regression equation. Not surprisingly, this vendor “achieved” a fantastic R-squared value of 86%. (Fantastic as in “like a fantasy,” perhaps?)

This is not as arcane or technical as it sounds. When you use giving to predict giving, you are essentially saying, “The people who will make big gifts in the future are the ones who have made big gifts in the past.” This is actually true! The thing is, you don’t need a predictive model to produce such a prospect list; all you need is a list of your top donors.

Now, this might be reassuring to whomever is paying a vendor big bucks to create the model. That person sees names they recognize, and they think, ah, good — we are not too far off the mark. And if you’re trying to convince your boss of the value of predictive modelling, he or she might like to see the upper ranks filled with familiar names.

I don’t find any of that “reassuring.” I find it a waste of time and effort — a fancy and expensive way to produce a list of the usual suspects.

If you want to know who has given you a lot of money, you make a list of everyone in your database and sort it in descending order by total amount given. If you want to predict who in your database is most likely to give you a lot of money in the future, build a predictive model using predictors that are associated with having given large amounts of money. Here is the key point … if you include “predictors” that mean the same thing as “has given a lot of money,” then the result of your model is not going to look like a list of future givers — it’s going to look more like your historical list of past givers.

Does that mean you should ignore giving history? No! Ideally you’d like to identify the donors who have made four-figure gifts who really have the capacity and affinity to make six-figure gifts. You won’t find them using past giving as a predictor, because your model will be blinded by the stars. The variables that represent giving history will cause all other affinity-related variables to pale in comparison. Many will be rejected from the model for being not significant or for adding nothing additional to the model’s ability to explain the variance in the outcome variable.

To sum up, here are the two big problems with using past giving to predict future giving:

  1. The resulting insights are sensible but not very interesting: People who gave before tend to give again. Or, stated another way: “Donors will be donors.” Fundraisers don’t need data scientists to tell them that.
  2. Giving-related independent variables will be so highly correlated with giving-related dependent variables that they will eclipse more subtle affinity-related variables. Weaker predictors will end up getting kicked out of our regression analysis because they can’t move the needle on R-squared, or because they don’t register as significant. Yet, it’s these weaker variables that we need in order to identify new prospects.

Let’s try a thought experiment. What if I told you that I had a secret predictor that, once introduced into a regression analysis, could explain 100% of the variance in the dependent variable ‘Lifetime Giving’? That’s right — the highest value for R-squared possible, all with a single predictor. Would you pay me a lot of money for that? What is this magic variable that perfectly models the variance in ‘Lifetime Giving’? Why, it is none other than ‘Lifetime Giving’ itself! Any variable is perfectly correlated with itself, so why look any farther?

This is an extreme example. In a real predictive model, a predictor based on giving history would be restricted to giving from the past, while the outcome variable would be calculated from a more recent period — the last year or whatever. There should be no overlap. R-squared would not be 100%, but it would be very high.

The R-squared statistic is useful for guiding you as you add variables to a regression analysis, or for comparing similar models in terms of fit with the data. It is not terribly useful for deciding whether any one model is good or bad. A model with an R-squared of 15% may be highly valuable, while one with R-squared of 75% may be garbage. If a vendor is trying to sell you on a model they built based on a high R-squared alone, they are misleading you.

The goal of predictive modeling for major gifts is not to maximize R-squared. It’s to identify new prospects.

2. Using “attributes” as predictors

Another thing about that webinar bugged me. The same vendor advised us to “select variables with caution, avoiding ‘descriptors’ and focusing on potential predictors.” Specifically, we were warned that a marital status of ‘married’ will emerge as correlated with giving. Don’t be fooled! That’s not a predictor, they said.

So let me get this straight. We carry out an analysis that reveals that married people are more likely to give large gifts, that donors with more than one degree are more likely to give large gifts, that donors who have email addresses and business phone numbers in the database are more likely to give large gifts … but we are supposed to ignore all that?

The problem might not be the use of “descriptors,” the problem might be with the terminology. Maybe we need to stop using the word “predictor”. One experienced practitioner, Alexander Oftelie, briefly touched on this nuance in a recent blog post. I quote, (emphasis added by me):

“Data that on its own may seem unimportant — the channel someone donates, declining to receive the mug or calendar, preferring email to direct mail, or making ‘white mail’ or unsolicited gifts beyond their sustaining-gift donation — can be very powerful when they are brought together to paint a picture of engagement and interaction. Knowing who someone is isn’t by itself predictive (at best it may be correlated). Knowing how constituents choose to engage or not engage with your organization are the most powerful ingredients we have, and its already in our own garden.”

I don’t intend to critique Alexander’s post, which isn’t even on this particular topic. (It’s a good one – please read it.) But since he’s written this, permit me scratch my head about it a bit.

In fact, I think I agree with him that there is a distinction between a behaviour and a descriptor/attribute. A behaviour, an action taken at a specific point in time (eg., attending an event), can be classified as a predictor. An attribute (“who someone is,” eg., whether they are married or single) is better described as a correlate. I would also be willing to bet that if we carefully compared behavioural variables to attribute variables, the behaviours would outperform, as Alexander says.

In practice, however, we don’t need to make that distinction. If we are using regression to build our models, we are concerned solely and completely with correlation. To say “at best it may be correlated” suggests that predictive modellers have something better at their disposal that they should be using instead of correlation. What is it? I don’t know, and Alexander doesn’t say.

If in a given data set, we can demonstrate that being married is associated with likelihood to make a donation, then it only makes sense to use that variable in our model. Choosing to exclude it based on our assumption that it’s an attribute and not a behaviour doesn’t make business sense. We are looking for practical results, after all, not chasing some notion of purity. And let’s not fool ourselves, or clients, that we are getting down to causation. We aren’t.

Consider that at least some “attributes” can be stated in terms of a behaviour. People get married — that’s a behaviour, although not related to our institution. People get married and also tell us about it (or allow it to be public knowledge so that we can record it) — that’s also a behaviour, and potentially an interaction with us. And on the other side of the coin, behaviours or interactions can be stated as attributes — a person can be an event attendee, a donor, a taker of surveys.

If my analysis informs me that widowed female alumni over the age of 60 are extremely good candidates for a conversation about Planned Giving, then are you really going to tell me I’m wrong to act on that information, just because sex, age and being widowed are not “behaviours” that a person voluntarily carries out? Mmmm — sorry!

Call it quibbling over semantics if you like, but don’t assume it’s so easy to draw a circle around true predictors. There is only one way to surface predictors, which is to take a snapshot of all potentially relevant variables at a point in time, then gather data on the outcome you wish to predict (eg., giving) after that point in time, and then assess each variable in terms of the strength of association with that outcome. The tools we use to make that assessment are nothing other than correlation and significance. Again, if there are other tools in common usage, then I don’t know about them.

Caveats and concessions

I don’t maintain that this or that practice is “wrong” in all cases, nor do I insist on rules that apply universally. There’s a lot of art in this science, after all.

Using giving history as a predictor:

  • One may use some aspects of giving to predict outcomes that are not precisely the same as ‘Giving’, for example, likelihood to enter into a Planned Giving arrangement. The required degree of difference between predictors and outcome is a matter of judgement. I usually err on the side of scrupulously avoiding ANY leakage of the outcome side of the equation into the predictor side — but sure, rules can be bent.
  • I’ve explored the use of very early giving (the existence and size of gifts made by donors before age 30) to predict significant giving late in life. (See Mine your donor data with this baseball-inspired analysis.) But even then, I don’t use that as a variable in a model; it’s more of a flag used to help select prospects, in addition to modeling.

Using descriptors/attributes as predictors:

  • Some variables of this sort will appear to have subtly predictive effects in-model, effects that disappear when the model is deployed and new data starts coming in. That’s regrettable, but it’s something you can learn from — not a reason to toss all such variables into the trash, untested. The association between marital status and giving might be just a spurious correlation — or it might not be.
  • Business knowledge mixed with common sense will help keep you out of trouble. A bit of reflection should lead you to consider using ‘Married’ or ‘Number of Degrees’, while ignoring ‘Birth Month’ or ‘Eye Colour’. (Or astrological sign!)

There are many approaches one can take with predictive modeling, and naturally one may feel that one’s chosen method is “best”. The only sure way to proceed is to take the time to define exactly what you want to predict, try more than one approach, and then evaluate the performance of the scores when you have actual results available — which could be a year after deployment. We can listen to what experts are telling us, but it’s more important to listen to what the data is telling us.

//////////

Note: When I originally posted this, I referred to Atsuko Umeki as “he”. I apologize for this careless error and for whatever erroneous assumption that must have prompted it.

13 April 2014

Optimizing lost alumni research, with a twist

Filed under: Alumni, Best practices, engagement, External data, Tableau — Tags: , , , , — kevinmacdonell @ 9:47 am

There are data-driven ways to get the biggest bang for your buck from the mundane activity of finding lost alumni. I’m going to share some ideas on optimizing for impact (which should all sound like basic common sense), and then I’m going to show you a cool data way to boost your success as you search for lost alumni and donors (the “twist”). If lost alumni is not a burning issue for your school, you still might find the cool stuff interesting, so I encourage you to skip down the page.

I’ve never given a great deal of thought to how a university’s alumni records office goes about finding lost alumni. I’ve simply assumed that having a low lost rate is a good thing. More addressable (or otherwise contactable) alumni is good: More opportunities to reengage and, one hopes, attract a gift. So every time I’ve seen a huge stack of returned alumni magazine covers, I’ve thought, well, it’s not fun, but what can you do. Mark the addresses as invalid, and then research the list. Work your way though the pile. First-in, first-out. And then on to the next raft of returned mail.

But is this really a wise use of resources? John Smith graduates in 1983, never gives a dime, never shows up for a reunion … is there likely to be any return on the investment of time to track him down? Probably not. Yet we keep hammering away at it.

All this effort is evident in my predictive models. Whenever I have a variable that is a count of ‘number of address updates’, I find it is correlated with giving — but only up to a point. Beyond a certain number of address updates, the correlation turns sharply negative. The reason is that while highly engaged alumni are conscientious about keeping alma mater informed of their whereabouts, alumni who are completely unengaged are perpetually lost. The ones who are permanently unreachable get researched the most and are submitted for data appends the most. Again and again a new address is entered into the database. It’s often incorrect — we got the wrong John Smith — so the mail comes back undeliverable, and the cycle begins again.

Consider that at any time there could be many thousands of lost alumni. It’s a never-ending task. Every day people in your database pull up stakes and move without informing you. Some of those people are important to your mission. Others, like Mr. Smith from the Class of 1983, are not. You should be investing in regular address cleanups for all records, but when it comes down to sleuthing for individuals, which is expensive, I think you’d agree that those John Smiths should never come ahead of keeping in touch with your loyal donors. I’m afraid that sometimes they do — a byproduct, perhaps, of people working in silos, pursuing goals (eg., low lost rates) that may be laudable in a narrow context but are not sufficiently aligned with the overall mission.

Here’s the common sense advice for optimizing research: ‘First-in, first-out’ is the wrong approach. Records research should always be pulling from the top of the pile, searching for the lost constituents who are deemed most valuable to your mission. Defining “most valuable” is a consultative exercise that must take Records staff out of the back office and face-to-face with fundraisers, alumni officers and others. It’s not done in isolation. Think “integration”.

The first step, then, is consultation. After that, all the answers you need are in the data. Depending on your tools and resources, you will end up with some combination of querying, reporting and predictive modelling to deliver the best research lists possible, preferably on a daily basis. The simplest approach is to develop a database query or report that produces the following lists in whatever hierarchical order emerges from consultation. Research begins with List 1 and does not proceed to List 2 until everyone on List 1 has been found. An example hierarchy might look like this:

  1. Major gift and planned giving prospects: No major gift prospect under active management should be lost (and that’s not limited to alumni). Records staff MUST review their lists and research results with Prospect Research and/or Prospect Management to ensure integrity of the data, share research resources, and alert gift officers to potentially significant events.
  2. Major gift donors (who are no longer prospects): Likewise, these folks should be 100% contactable. In this case, Records needs to work with Donor Relations.
  3. Planned Giving expectancies: I’m not knowledgeable about Planned Giving, but it seems to me that a change of address for an expectancy could signal a significant event that your Planned Giving staff ought to know about. A piece of returned mail might be a good reason to reach out and reestablish contact.
  4. Annual Giving Leadership prospects and donors: The number of individuals is getting larger … but these lists should be reviewed with Annual Fund staff.
  5. Annual Fund donors who gave in the past year.
  6. Annual Fund donors who gave in the year previous.
  7. All other Annual Fund donors, past five or 10 years.
  8. Recent alumni volunteers (with no giving)
  9. Recent event attendees (reunions, etc.) — again, who aren’t already represented in a previous category.
  10. Young alumni with highest scores from predictive models for propensity to give (or similar).
  11. All other non-donor alumni, ranked by predictive model score.

Endless variations are possible. Although I see potential for controversy here, as everyone will feel they need priority consideration, I urge you not to shrink from a little lively discussion — it’s all good. It may be that in the early days of your optimization effort, Annual Fund is neglected while you clean up your major gift and planned giving prospect/donor lists. But in time, those high-value lists will become much more manageable — maybe a handful of names a week — and everyone will be well-served.

There’s a bit of “Do as I say, not as I do” going on here. In my shop, we are still evolving towards becoming data-driven in Records. Not long ago I created a prototype report in Tableau that roughly approximates the hierarchy above. Every morning, a data set is refreshed automatically that feeds these lists, one tab for each list, and the reports are available to Records via Tableau Server and a browser.

That’s all fine, but we are not quite there yet. The manager of the Records team said to me recently, “Kevin, can’t we collapse all these lists into a single report, and have the names ranked in order by some sort of calculated score?” (I have to say, I feel a warm glow when I hear talk like that.) Yes — that’s what we want. The hierarchy like the one above suggests exclusive categories, but a weighted score would allow for a more sophisticated ranking. For example, a young but loyal Annual Fund donor who is also a current volunteer might have a high enough score to outrank a major gift prospect who has no such track record of engagement — maybe properly so. Propensity scores could also play a much bigger role.

However it shakes out, records research will no longer start the day by picking up where the previous day’s work left off. It will be a new list every morning, based on the actual value of the record to the institution.

And now for the twist …

Some alumni might not be addressable, but they are not totally lost if you have other information such as an email address. If they are opening your email newsletters, invitations and solicitations, then you might be able to determine their approximate geographic location via the IP address given to them by their internet service provider.

That sounds like a lot of technical work, but it doesn’t have to be. Your broadcast email platform might be collecting this information for you. For example, MailChimp has been geolocating email accounts since at least 2010. The intention is to give clients the ability to segment mailings by geographic location or time zone. You can use it to clue you in to where in the world someone lives when they’ve dropped off your radar.

(Yes, yes, I know you could just email them to ask them to update their contact info. But the name of this blog is CoolData, not ObviousData.)

What MailChimp does is append latitude and longitude coordinates to each email record in your account. Not everyone will have coordinates: At minimum, an alum has to have interacted with your emails in order for the data to be collected. As well, ISP-provided data may not be very accurate. This is not the same as identifying exactly where someone lives (which would be fraught with privacy issues), but it should put the individual in the right city or state.

In the data I’m looking at, about half of alumni with an email address also have geolocation data. You can download this data, merge it with your records for alumni who have no current valid address, and then the fun begins.

I mentioned Tableau earlier. If you’ve got lat-long coordinates, visualizing your data on a map is a snap. Have a look at the dashboard below. I won’t go into detail about how it was produced, except to say that it took only an hour or so. First I queried the database for all our alumni who don’t have a valid preferred address in the database. For this example, I pulled ID, sum of total giving, Planned Giving status (i.e., current expectancy or no), and the city, province/state and country of the alum’s most recent valid address. Then I joined the latitude and longitude data from MailChimp, using the ID as the common key.

The result was a smallish data file (less than 1,000 records), which I fed into Tableau. Here’s the result, scrubbed of individual personal information — click on the image to get a readable size.

map_alums

The options at top right are filters that enable the user to focus on the individuals of greatest interest. I’ve used Giving and Planned Giving status, but you can include anything — major gift prospect status, age, propensity score — whatever. If I hover my cursor over any dot on the map, a tooltip pops up containing information about the alum at that location, including the city and province/state of the last place they lived. I can also zoom in on any portion of the map. When I take a closer look at a certain tropical area, I see one dot for a person who used to live in Toronto and one for a former Vancouverite, and one of these is a past donor. Likewise, many of the alumni scattered across Africa and Asia last lived in various parts of eastern Canada.

These four people are former Canadians who are now apparently living in a US city — at least according to their ISP. I’ve blanked out most of the info in the tooltip:

manhattan

If desired, I could also load the email address into the tooltip and turn it into a mailto link: The user could simply click on the link to send a personal message to the alum.

(What about people who check email while travelling? According to MailChimp, location data is not updated unless it’s clear that a person is consistently checking their email over an extended period of time — so vacations or business trips shouldn’t be a factor.)

Clearly this is more dynamic and interesting for research than working from a list or spreadsheet. If I were a records researcher, I would have some fun filtering down on the biggest donors and using the lcoation to guide my search. Having a clue where they live now should shorten the time it takes to decide that a hit is a real match, and should also improve the number of correct addresses. As well, because a person has to actually open an email in order to register their IP with the email platform, they are also sending a small signal of engagement. The fact they’re engaging with our email is assurance that going to the trouble to research their address and other details such as employment is not a waste of time.

This is a work in progress. My example is based on some manual work — querying the database, downloading MailChimp data, and merging the files. Ideally we would automate this process using the vendor’s API and scheduled data refreshes in Tableau Server. I can also see applications beyond searching for lost alumni. What about people who have moved but whose former address is still valid, so the mail isn’t getting returned? This is one way to proactively identify alumni and donors who have moved.

MailChimp offers more than just geolocation. There’s also a nifty engagement score, based on unsubscribes, opens and click-throughs. Stay tuned for more on this — it’s fascinating stuff.

27 June 2013

Time management for data analysts

Filed under: Best practices, Training / Professional Development — Tags: , , — kevinmacdonell @ 5:32 am

Does it seem you never have enough time to get your work done? You’ve got a long list of projects, more than a few of which are labeled Top Priority — as if multiple projects could simultaneously be “top priority” — along with your own analysis projects which too often get pushed aside. We aren’t going to create more time for ourselves, and there’s only so much we are empowered to say “no” to. So we need a different strategy.

The world does not need another blog post about how to be more productive, or a new system to fiddle with instead of doing real work. However, I’ve learned a few things about how to manage my own time and tasks (I have done my share of reading and fiddling), and perhaps some of what works for me will be helpful to analysts … and to prospect researchers, alumni magazine feature writers, or anyone else with work that requires extended periods of focused work.

First and foremost, I’ve learned that “managing time” isn’t an effective approach. Time isn’t under your control, therefore you can’t manage it. What IS under your control (somewhat) is your attention. If you can manage your attention on a single task for a few stretches of time every day, you will be far more productive. You need to identify unambiguously what it is you should be working on right now from among an array of competing priorities, and you need to be mentally OK with everything you’re not doing, so that you can focus.

My “system” is hardly revolutionary but it is an uncomplicated way to hit a few nails on the head: prioritization and project management, focus and “flow”, motivation, and accountability and activity tracking. Again, it’s not about managing your time, it’s about managing your projects first so that you can choose wisely, and then managing your attention so you can focus on that choice.

Here is an Excel template you can use to get started: Download Projects & Calendar – CoolData.org. As promised, it’s nothing special. There are two main elements: One is a simple list of projects, with various ways to prioritize them, and the other is a drop-dead simple calendar with four periods or chunks of time per day, each focused on a single project.

Regarding the first tab: A “project” is anything that involves more than one step and is likely to take longer than 60 minutes to complete. This could include anything from a small analysis that answers a single question, to a big, hairy project that takes months. The latter is probably better chunked into a series of smaller projects, but the important thing is that simple tasks don’t belong here — put those on a to-do list. Whenever a new project emerges — someone asks a complicated question that needs an answer or has a business problem to solve — add it to the projects list, at least as a placeholder so it isn’t forgotten.

You’ll notice that some columns have colour highlighting. I’ll deal with those later. The uncoloured columns are:

Item: The name of the project. It would be helpful if this matched how the project is named elsewhere, such as your electronic or paper file folders or saved-email folders.

Description: Brief summary of what the project is supposed to accomplish, or other information of note.

Area: The unit the project is intended to benefit. (Alumni Office, Donor Relations, Development, etc.)

Requester: If applicable, the person most interested in the project’s results. For my own research tasks, I use “Self”.

Complete By: Sometimes this is a hard deadline, usually it’s wishful thinking. This field is necessary but not very useful in the short term.

Status/Next Action: The very next thing to be done on the project. Aside from the project name itself, this is THE single most important piece of information on the whole sheet. It’s so important, I’m going to discuss it in a new paragraph.

Every project MUST have a Next Action. Every next action should be as specific as possible, even if it seems trivial. Not “Start work on the Planned Giving study, ” but rather, “Find my folder of notes from the Planned Giving meeting.” Having a small and well-defined task that can be done right now is a big aid to execution. Compare that to thinking about the project as a whole — a massive, walled fortress without a gate — which just creates anxiety and paralysis. Like the proverbial journey, executing one well-defined step after another gets the job done eventually.

A certain lack of focus might be welcome at the very beginning of an analysis project, when some aimless doodling around with pencil and paper or a few abortive attempts at pulling sample data might help spark some creative ideas. With highly exploratory projects things might be fuzzy for a long time. But sooner or later if a project is going to get done it’s going to have an execution stage, which might not be as much fun as the exploratory stage. Then it’s all about focus. You will need the encouragement of a doable Next Action to pull you along. A project without a next action is just a vague idea.

When a project is first added to the list as a placeholder until more details become available, the next action may be unclear. Therefore the Next Action is getting clarity on the next action, but be specific. That means, “Email Jane about what she wants the central questions in the analysis to be,” not “Get clarity.”

(The column is also labeled “Status.” If a project is on hold, that can be indicated here.)

Every Next Action also needs a Next Action Date. This may be your own intended do-by date, an externally-set deadline, or some reasonable amount of time to wait if the task is one you’ve delegated to someone else or you have requested more information. Whatever the case, the Next Action Date is more important than the overall (and mostly fictitious) project completion date. That’s why the Next Action Date is conditionally formatted for easy reference, and the Completion Date is not. The former is specific and actionable, the latter is just a container for multiple next actions and is not itself something that can be “done”. (I will say more about conditional formatting shortly.)

When you are done with a project for the day, your last move before going on to something else is to decide on and record what the very next action will be when you return to that project. This will minimize the time you waste in switching from one task to another, and you’ll be better able to just get to work. Not having a clear reentry point for a project has often sidetracked me into procrastinating with busy-work that feels productive but isn’t.

The workbook holds a tab called Completed Projects. When you’re done with a project, you can either delete the row, or add it to this tab. The extra trouble of copying the row over might be worth it if you need to report on activity or produce a list of the last year’s accomplishments. As well, you can bet that some projects that are supposedly complete (but not under your control) will come up again like a meal of bad shellfish. It’s helpful to be able to look up the date you “completed” something, in order to find the files, emails and documentation you created at the time. (By the way, if you don’t document anything, you deserve everything bad that comes to you. Seriously.) If the project was complex, a lot of valuable time can be saved if you can effectively trace your steps and pick up from where you left off.

I mentioned that several columns are conditionally formatted to display varying colour intensities which will allow you to assess priorities at a glance. We’re all familiar with the distinction between “important” and “urgent”. At any time we will have jobs that must get done today but are not important in the long run. Important work, on the other hand, might someday change the whole game yet is rarely “urgent” today. It has a speculative nature to it and it may not be evident why it makes sense to clear the decks for it. This is one reason for trying to set aside some time for speculative, experimental projects — you just never know.

The Priority Rating column is where I try to balance the two (urgent vs. important), using a scale of 1 to 10, with 1 being the top priority. I don’t bother trying to ensure that only one project is a ‘1’, only one is a ‘2’, etc. — I rate each project in isolation based on a sense of how in-my-face I feel it has to be, and of course that changes all the time.

Other columns use similar flagging:

Urgent: The project must be worked on now. The cell turns red if the value is “Y”. Although it may seem that everything is urgent, reserve this for emergencies and hard deadlines that are looming. It’s not unusual for me to have something flagged Urgent, yet it has a very low priority rating … which tells you how important I think a lot of “urgent stuff” is.

Percent Complete: A rough estimate of how far along you think you are in a project. The closer to zero, the darker the cell is. Consult these cells on days when you feel it’s time to move the yardsticks on some neglected projects.

Next Action Date: As already mentioned, this is the intended date or deadline for the very next action to be taken to move the project forward. The earlier in time the Next Action Date is, the darker the cell.

Date Added: I’m still considering whether I need this column, so it doesn’t appear in my sample file. This is the date a project made it onto the list. Conditional formatting would highlight the oldest items, which would reveal the projects that have been languishing the longest. If a project has been on your list for six months and it’s 0% done, then it’s not a project — it’s an idea, and it belongs somewhere else rather than cluttering today’s view, which should be all about action. You could move it to an On Hold tab or an external list. Or just delete it. If it’s worth doing, it’ll come back.

Here’s a far-away look at the first tab of my projects list. At a glance you can see how your eye is drawn to project needing attention, as variously defined by priority, urgency, completeness, and proximity of the next deadline. There is no need to filter or sort rows, although you could do so if you wanted.

projects_list_sample

The other main element in this workbook is a simple calendar, actually a series of calendars. Each day contains four blocks of time, with breaks in between. You’ll notice that there are no time indications. The time blocks are intended to be roughly 90 minutes, but they can be shorter or longer, depending on how long a period of time you can actually stay focused on a task.

If you’re like me, that period is normally about five minutes, and for that reason we need a bit of gentle discipline. I tell myself that I am about to begin a “sprint” of work. I commit wholly to a single project, and I clear the deck for just that project, based on the knowledge that there is a time limit to how long I will work to the exclusion of all distractions until I can goof off with Twitter or what have you. I have made a bargain with myself: Okay, FINE, I will dive into that THING I’ve been avoiding, but don’t bother me again for a week!

The funny thing is, that project I’ve been avoiding will often begin to engage me after I’ve invested enough time. The best data analysis work happens when you are in a state of “flow,” characterized by total absorption in a challenging task that is a match for your skills. If you have to learn new techniques or skills in order to meet that challenge, the work might actually feel like it is rewarding you with an opportunity to grow a bit.

Flow requires blocks of uninterrupted time. There may not be much you can do about people popping by your work station to chat or to ask for things, but you can control your self-interruptions, which I’ve found are far more disruptive. I’m going to assume you’ve already shut off all alerts for email and instant messaging apps on your computer. I would go a step farther and shut down your email client altogether while you’re working through one of your 90-minute sprints, and silence your phone.

If shutting off email and phone is not a realistic option for you, ask yourself why. If you’re in a highly reactive mode, responding to numerous small requests, then regardless of what your job title is, you may not be an analyst. If the majority of your time is spent looking up stuff, producing lists, and updating and serving reports, then you need to consider an automation project or a better BI infrastructure that will allow you more time for creative, analytical work. Just saying.

On the other hand, I’ve always been irritated by the productivity gurus who say you should avoid checking email at the start of the work day, or limit email checking to only twice a day. This advice cannot apply to anyone working in the real world. Sure, you can lose the first hour of the day getting sucked into email, but a single urgent message from on high can shuffle priorities for the day, and you’d better be aware of it. A good morning strategy would be to first open your projects file, identify what your first time block contains, reviewing the first action to take, and getting your materials ready to work. THEN you can quickly consult your email for any disruptive missives (don’t read everything!) before shutting down your client and setting off to do what you set out to do. You don’t necessarily have to tackle your first time block as soon as you sit down; you just need to ensure that you fit two time blocks into your morning.

Other time block tips:

  • While you’re in the midst of your time block, keep a pad of paper handy (or a Notepad file open) to record any stray thoughts about unrelated things that occur to you, or any new tasks or ideas that occur to you and threaten to derail you. You may end up getting derailed, if the new thing is important or interesting enough, but if not, jotting a note can prevent you from having to fire up your email again or make a phone call, or whatever, and save the interruption for when you’ve reached a better stopping point.
  • Try to exert some control over when meetings are scheduled. For meetings that are an hour or longer, avoid scheduling them so that they knock a hole right in the centre of either the morning or the afternoon, leaving you with blocks of time before and after that are too short to allow you to really get into your project.
  • Keep it fluid, and ignore the boundaries of time blocks when you’re in “flow” and time passes without your being conscious of it. If you’re totally absorbed in a project that you’ve been dreading or avoiding previously, then by all means press on. Just remember to take a break.
  • When you come to the end of a block, take a moment to formulate the next action to take on that project before closing off.
  • If you happen to be called away on something urgent when you’re in the middle of a time block, try to record the next action as a placeholder. Task-switching is expensive, both in time and in mental energy. Always be thinking of leaving a door open, even if the next action seems obvious at the time. You will forget.

I usually fill projects into time blocks only a few days in advance. The extra two weeks are there in case I want to do more long-term planning. The more important the project, the more time blocks it gets, and the more likely I am to schedule it for the first time block in the morning. Note that this tool isn’t used to schedule your meetings — that’s a separate thing and you probably already have something for that. It would be nice if meetings and project focusing could happen in the same view, but to me they are different things.

At the end of a week, I move the tab for the current calendar to the end of the row, rename it to show the date range it represents, and replace it with next week’s calendar, renaming and copying tabs as needed to prepare for the week to come. I am not sure if saving old calendars serves a purpose — it might make more sense to total up the estimated number of hours invested in the project that week, keeping a running total by project on the first tab — but like everything this is a work in progress.

Your Excel file might be saved on a shared drive and made accessible to anyone who needs to know what you’re working on. In that case, I suggest adding a password, one that allows users to open the file for reading, but prevents them from saving any changes.

And finally … this workbook thing is just a suggestion. Use a system or tool that works for you. What I’ve outlined here is partly inspired by books such as David Allen’s “Getting Things Done: The Art of Stress-Free Productivity” (which is also a whole system that goes by the same name), and Mihaly Csikszentmihalyi’s “Flow: The Psychology of Optimal Experience,” as well as a host of blog posts and media stories about creativity and productivity the details of which I’ve long forgotten but which have influenced the way I go about doing work.

Your employer might mandate the use of a particular tool for time and/or project management; use it if you have to, or if it serves your needs. More likely than not, though, it won’t help you manage the most limited resource of all: your attention. Find your own way to marshall that resource, and your time and projects will take care of themselves.

18 April 2013

A response to ‘What do we do about Phonathon?’

I had a thoughtful response to my blog post from earlier this week (What do we do about Phonathon?) from Paul Fleming, Database Manager at Walnut Hill School for the Arts in Natick, Massachusetts, about half an hour from downtown Boston. With Paul’s permission, I will quote from his email, and then offer my comments afterword:

I just wanted to share with you some of my experiences with Phonathon. I am the database manager of a 5-person Development department at a wonderful boarding high school called the Walnut Hill School for the Arts. Since we are a very small office, I have also been able to take on the role of the organizer of our Phonathon. It’s only been natural for me to combine the two to find analysis about the worth of this event, and I’m happy to say, for our own school, this event is amazingly worthwhile.

First of all, as far as cost vs. gain, this is one of the cheapest appeals we have. Our Phonathon callers are volunteer students who are making calls either because they have a strong interest in helping their school, or they want to be fed pizza instead of dining hall food (pizza: our biggest expense). This year we called 4 nights in the fall and 4 nights in the spring. So while it is an amazing source of stress during that week, there aren’t a ton of man-hours put into this event other than that. We still mail letters to a large portion of our alumni base a few times a year. Many of these alumni are long-shots who would not give in response to a mass appeal, but our team feels that the importance of the touch point outweighs the short-term inefficiencies that are inherent in this type of outreach.

Secondly, I have taken the time to prioritize each of the people who are selected to receive phone calls. As you stated in your article, I use things like recency and frequency of gifts, as well as other factors such as event participation or whether we have other details about their personal life (job info, etc). We do call a great deal of lapsed or nondonors, but if we find ourselves spread too thin, we make sure to use our time appropriately to maximize effectiveness with the time we have. Our school has roughly 4,400 living alumni, and we graduate about 100 wonderful, talented students a year. This season we were able to attempt phone calls to about 1,200 alumni in our 4 nights of calling. The higher-priority people received up to 3 phone calls, and the lower-priority people received just 1-2.

Lastly, I was lucky enough to start working at my job in a year in which there was no Phonathon. This gave me an amazing opportunity to test the idea that our missing donors would give through other avenues if they had no other way to do so. We did a great deal of mass appeals, indirect appeals (alumni magazine and e-newsletters), and as many personalized emails and phone calls as we could handle in our 5-person team. Here are the most basic of our findings:

In FY11 (our only non-Phonathon year), 12% of our donors were repeat donors. We reached about 11% participation, our lowest ever. In FY12 (the year Phonathon returned):

  • 27% of our donors were new/recovered donors, a 14% increase from the previous year.
  • We reached 14% overall alumni participation.
  • Of the 27% of donors who were considered new/recovered, 44% gave through Phonathon.
  • The total amount of donors we had gained from FY11 to FY12 was about the same number of people who gave through the Phonathon.
  • In FY13 (still in progess, so we’ll see how this actually plays out), 35% of the previously-recovered donors who gave again gave in response to less work-intensive mass mailing appeals, showing that some of these Phonathon donors can, in fact, be converted and (hopefully) cultivated long-term.

In general, I think your article was right on point. Large universities with a for-pay, ongoing Phonathon program should take a look and see whether their efforts should be spent elsewhere. I just wanted to share with you my successes here and the ways in which our school has been able to maintain a legitimate, cost-effective way to increase our participation rate and maintain the quality of our alumni database.

Paul’s description of his program reminds me there are plenty of institutions out there who don’t have big, automated, and data-intensive calling programs gobbling up money. What really gets my attention is that Walnut Hill uses alumni affinity factors (event attendance, employment info) to prioritize calling to get the job done on a tight schedule and with a minimum of expense. This small-scale data mining effort is an example for the rest of us who have a lot of inefficiency in our programs due to a lack of focus.

The first predictive models I ever created were for a relatively small university Phonathon that was run with printed prospect cards and manual dialing — a very successful program, I might add. For those of you at smaller institutions wondering if data mining is possible only with massive databases, the answer is NO.

And finally, how wonderful it is that Walnut Hill can quantify exactly what Phonathon contributes in terms of new donors, and new donors who convert to mail-responsive renewals.

Bravo!

20 September 2012

When less data is more, in predictive modelling

When I started doing predictive modelling, I was keenly interested in picking the best and coolest predictor variables. As my understanding deepened, I turned my attention to how to define the dependent variable in order to really get at what I was trying to predict. More recently, however, I’ve been thinking about refining or limiting the population of constituents to be scored, and how that can help the model.

What difference does it make who gets a propensity score? Up until maybe a year ago, I wasn’t too concerned. Sure, probably no 22-year-old graduate had ever entered a planned giving agreement, but I didn’t see any harm in applying a score to all our alumni, even our youngest.

Lately, I’m not so sure. Using the example of a planned gift propensity model, the problem is this: Young alumni don’t just get a score; they also influence how the model is trained. If all your current expectancies were at least 50 before they decided to make a bequest, and half your alumni are under 30 years old, then one of the major distinctions your model will make is based on age. ANY alum over 50 is going to score well, regardless of whether he or she has any affinity to the institution, simply because 100% of your target is in that age group.

The model is doing the right thing by giving higher scores to older alumni. If ages in the sample range from 21 to 100+, then age as a variable will undoubtedly contribute to a large chunk of the model’s ability to “explain” the target. But this hardly tells us anything we didn’t already know. We KNOW that alumni don’t make bequest arrangements at age 22, so why include them in the model?

It’s not just the fact that their having a score is irrelevant. I’m concerned about allowing good predictor variables to interact with ‘Age’ in a way that compromises their effectiveness. Variables are being moderated by ‘Age’, without the benefit of improving the model in a way that we get what we want out of it.

Note that we don’t have to explicitly enter ‘Age’ as a variable in the model for young alumni to influence the outcome in undesirable ways. Here’s an example, using event attendance as a predictor:

Let’s say a lot of very young alumni and some very elderly constituents attend their class reunions. The older alumni who attend reunions are probably more likely than their non-attending classmates to enter into planned giving agreements — for my institution, that is definitely the case. On the other hand, the young alumni who attend reunions are probably no more or less likely than their non-attending peers to consider planned giving — no one that age is a serious prospect. What happens to ‘event attendance’ as a predictor in which the dependent variable is ‘Current planned giving expectancy’? … Because a lot of young alumni who are not members of the target variable attended events, the attribute of being an event attendee will be associated with NOT being a planned giving expectancy. Or at the very least, it will considerably dilute the positive association between predictor and target found among older alumni.

I confirmed this recently using some partly made-up data. The data file started out as real alumni data and included age, a flag for who is a current expectancy, and a flag for ‘event attendee’. I massaged it a bit by artificially bumping up the number of alumni under the age of 50 who were coded as having attended an event, to create a scenario in which an institution’s events are equally popular with young and old alike. In a simple regression model with the entire alumni file included in the sample, ‘event attendance’ was weakly associated with being a planned giving expectancy. When I limited the sample to alumni 50 years of age and older, however, the R squared statistic doubled. (That is, event attendance was about twice as effective at explaining the target.) Conversely, when I limited the sample to under-50s, R squared was nearly zero.

True, I had to tamper with the data in order to get this result. But even had I not, there would still have been many under-50 event attendees, and their presence in the file would still have reduced the observed correlation between event attendance and planned giving propensity, to no useful end.

You probably already know that it’s best not to lump deceased constituents in with living ones, or non-alumni along with alumni, or corporations and foundations along with persons. They are completely distinct entities. But depending on what you’re trying to predict, your population can fruitfully be split along other, more subtle distinctions. Here are a few:

  • For donor acquisition models, in which the target value is “newly-acquired donor”, exclude all renewed donors. You strictly want to have only newly-acquired donors and never-donors in your model. Your good prospects for conversion are the never-donors who most resemble the newly-acquired donors. Renewed donors don’t serve any purpose in such a model and will muddy the waters considerably.
  • Conversely, remove never-donors from models that predict major giving and leadership-level annual giving. Those higher-level donors tend not to emerge out of thin air: They have giving histories.
  • Looking at ‘Age’ again … making distinctions based on age applies to major-gift propensity models just as it does to planned giving propensity: Very young people do not make large gifts. Look at your data to find out at what age donors were when they first gave $1,000, say. This will help inform what your cutoff should be.
  • When building models specifically for Phonathon, whether donor-acquisition or contact likelihood, remove constituents who are coded Do Not Call or who do not have a valid phone number in the database, or who are unlikely to be called (international alumni, perhaps).
  • Exclude international alumni from event attendance or volunteering likelihood models, if you never offer involvement opportunities outside your own country or continent.

Those are just examples. As for general principles, I think both of the following conditions must be met in order for you to gain from excluding a group of constituents from your model. By a “group” I mean any collection of individuals who share a certain trait. Choose to exclude IF:

  1. Nearly 100% of constituents with the trait fall outside the target behaviour (that is, the behaviour you are trying to predict); AND,
  2. Having a score for people with that trait is irrelevant (that is, their scores will not result in any action being taken with them, even if a score is very low or very high).

You would apply the “rules” like this … You’re building a model to predict who is most likely to answer the phone, for use by Phonathon, and you’re wondering what to do with a bunch of alumni who are coded Do Not Call. Well, it stands to reason that 1) people with this trait will have little or no phone contact history in the database (the target behaviour), and 2) people with this trait won’t be called, even if they have a very high contact-likelihood score. The verdict is “exclude.”

It’s not often you’ll hear me say that less (data) is more. Fewer cases in your data file will in fact tend to depress your model’s R squared. But your ultimate goal is not to maximize R squared — it’s to produce a model that does what you want. Fitting the data is a good thing, but only when you have the right data.

6 June 2012

How you measure alumni engagement is up to you

Filed under: Alumni, Best practices, Vendors — Tags: , , , — kevinmacdonell @ 8:02 am

There’s been some back-and-forth on one of the listservs about the “correct” way to measure and score alumni engagement. An emphasis on scientific rigor is being pressed for by one vendor who claims to specialize in rigor. The emphasis is misplaced.

No doubt there are sophisticated ways of measuring engagement that I know nothing about, but the question I can’t get beyond is, how do you define “engagement”? How do you make it measurable so that one method applies everywhere? I think that’s a challenging proposition, one that limits any claim to “correctness” of method. This is the main reason that I avoid writing about measuring engagement — it sounds analytical, but inevitably it rests on some messy, intuitive assumptions.

The closest I’ve ever seen anyone come is Engagement Analysis Inc., a firm based here in Canada. They have a carefully chosen set of engagement-related survey questions which are held constant from school to school. The questions are grouped in various categories or “drivers” of engagement according to how closely related (statistically) the responses tend to be to each other. Although I have issues with alumni surveys and the dangers involved in interpreting the results, I found EA’s approach fascinating in terms of gathering and comparing data on alumni attitudes.

(Disclaimer: My former employer was once a client of this firm’s but I have no other association with them. Other vendors do similar and very fine work, of course. I can think of a few, but haven’t actually worked with them, so I will not offer an opinion.)

Some vendors may make claims of being scientific or analytically correct, but the only requirement of quantifying engagement is that it be reasonable, and (if you are benchmarking against other schools) consistent from school to school. In general, if you want to benchmark, then engage a vendor if you want to do it right, because it’s not easily done.

But if you want to benchmark against yourself (that is, over time), don’t be intimidated by anyone telling you your method isn’t good enough. Just do your own thing. Survey if you like, but call first upon the real, measurable activities that your alumni participate in. There is no single right way, so find out what others have done. One institution will give more weight to reunion attendance than to showing up for a pub night, while another will weigh all event attendance equally. Another will ditch event attendance altogether in favour of volunteer activity, or some other indicator.

Can anyone say definitively that any of these approaches are wrong? I don’t think so — they may be just right for the school doing the measuring. Many schools (mine included) assign fairly arbitrary weights to engagement indicators based on intuition and experience. I can’t find fault with that, simply because “engagement” is not a quantity. It’s not directly measurable, so we have to use proxies which ARE measurable. Other schools measure the degree of association (correlation) between certain activities and alumni giving, and base their weights on that, which is smart. But it’s all the same to me in the end, because ‘giving’ is just another proxy for the freely interpretable quality of “engagement.”

Think of devising a “love score” to rank people’s marriages in terms of the strength of the pair bond. A hundred analysts would head off in a hundred different directions at Step 1: Defining “love”. That doesn’t mean the exercise is useless or uninteresting, it just means that certain claims have to be taken with a grain of salt.

We all have plenty of leeway to chose the proxies that work for us, and I’ve seen a number of good examples from various schools. I can’t say one is better than another. If you do a good job measuring the proxies from one year to the next, you should be able to learn something from the relative rises and falls in engagement scores over time and compared between different groups of alumni.

Are there more rigorous approaches? Yes, probably. Should that stop you from doing your own thing? Never!

Older Posts »

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,086 other followers