CoolData blog

15 July 2011

Answering questions about “How many times to keep calling”

Filed under: Annual Giving, John Sammis, Model building, Peter Wylie, Phonathon, regression — kevinmacdonell @ 8:27 am

The recent discussion paper on Phonathon call attempts by Peter Wylie and John Sammis elicited a lot of response. There were positive responses. (“Well, that’s one of the best things I’ve seen in a while. I’m a datahead. I admit it. Thank you for sharing this.”) There were also many questions, maybe even a little skepticism. I will address some of those questions today.

Question: You discuss modeling to determine the optimum number of times to call prospects, but what about the cost of calling them?

A couple of readers wanted to know why we didn’t pay any attention to the cost of solicitation, and therefore return on investment. Wouldn’t it make sense to cut off calling a segment once “profitability” reached some unacceptably low point?

I agree that cost is important. Unfortunately, cost accounting can be complicated even within the bounds of a single program, let alone compared across institutions. In my own program, money for student wages comes from one source, money for technology and software support comes from another, while regular expenses such as phone and network charges are part of my own budget. If I cannot realize efficiencies in my spending and reallocate dollars to other areas, does it makes sense to include them in my cost accounting? I’m not sure.

And is it really a matter of money? I would argue that the budget determines how many weeks of calling are possible. Therefore, the limiting factor is actually TIME. Many (most?) phone programs do little more than call as many people as possible in the time available. They call with no regard for prospects’ probability of giving (aside from favouring LYBUNTs), spreading their limited resources evenly over all prospects — that is, suboptimally.

The first step, then, is to spend more time calling prospects who are likely to answer the phone, and less time calling prospects who aren’t. ROI is important, but if you’re not segmenting properly then you’re always going to end up simultaneously giving up on high-value prospects prematurely AND hanging on to low-value prospects beyond the limit of profitability.

Wylie and Sammis’s paper provides insight into a way we might intelligently manage our programs, mainly by showing a way to focus limited resources, and more generally by encouraging us to make use of the trove of data generated by automated calling software. Savvy annual fund folks who really have a handle on costs and want to delve into ROI as well should step up and do so — we’d love to see that study. (Although, I have to say, I’m not holding my breath.)

Question: Which automated calling software did these schools use?

The data samples were obtained from three schools who use the software of a single vendor, and participants were invited via the vendor’s client listserv. The product is called CampusCall, by RuffaloCODY. Therefore the primary audience of this paper could assume that Wylie and Sammis were addressing auto dialers and not predictive dialers or manual programs. This is not an endorsement of the product — any automated calling software should provide the ability to export data suitable for analysis.

By the way, manual calling programs can also benefit from data mining. There may be less call-result data to feed back into the modeling process than there would be in an automated system, but there is no reason why modeling cannot be used to segment intelligently in a manual program.

If you have a manual program and you’re calling tens of thousands of alumni — consider automating. Seriously.

Question: What do some of these “call result” categories mean?

At the beginning of the study, all the various codes for call results were divided into two categories, ‘contact made’ and ‘contact not made’. Some readers were curious about what some of the codes meant. Here are some of the codes that have meanings which are not obvious. None of these are contacts.

  • Re-assigned: The phone number has been reassigned to a new person. The residents at this phone number do not know the prospect you are attempting to reach.
  • FAX2: The call went to a fax, modem or data line for the second time — this code removes the number from more calling.
  • Hung up: This is technically a contact, but so often the caller doesn’t know if the prospect answered (or someone else in the household), and often the phone is hung up before the caller can introduce him/herself, in which case the encounter doesn’t meet the definition of a contact, which is an actual conversation with the prospect. So we didn’t count these as contacts.
  • Call back2: The prospect or someone else in the household asks to be called back some other time, but if this was the last result code, no future attempt was made. Not a contact.
  • NAO: Not Available One Hour. The prospect can’t come to the phone, call back in an hour — but obviously the callback didn’t happen, because NAO is still the last result.

Question: Why did you include disconnects and wrong numbers in your analysis? Wouldn’t you stop calling them (presumably after the first attempt), regardless of what their model score was? A controlled experiment would seem to call for leaving them out, and your results might be less impressive if you did so.

Good point. When a phone number proves invalid (as opposed to simply going to an answering machine or ringing without an answer), there’s no judgement possible about whether to place one more call to that number. Regardless of the affinity score, you’re done with that alum.

If we conducted a new study, perhaps we would exclude bad phone numbers. It’s my opinion that rerunning the analysis would be more of a refinement on what we’ve learned here, rather than uncovering something new. I think it’s up to the people who use this data in their programs to take this new idea and mine their own data in the light of it — and yes, refine it as well.

This was not a controlled experiment, by the way. This was a data-mining exploration which revealed a useful insight which, the authors hope, will lead to others digging into their own call centre data. True controlled experiments are hard to do — but wouldn’t it be great if fundraisers would collaborate with the experts in statistics and experimental design teaching on their own campuses?

Question: What modeling methods did you use? Did you compare models?

The paper made reference to multiple linear regression, which implies that the dependent variable is continuous. The reader wanted to know if the modeling method was actually logistic regression, or if two or more models were created and compared against a holdout sample.

The outcome variable was in fact a binary variable, “contact made”. Every prospect could have only two states (contacted / not contacted), because each person can be contacted only once. The result of a contact might be a pledge, no pledge, maybe, or “do not call” — but in any case, the result is binary.

(Only one model was created and there was no validation set, because this was more of an exploration to discover whether doing so could yield a model with practical uses, rather than a model built to be employed in a program.)

Although the DV was binary, the authors used multiple regression. A comparison of the two methods would be interesting, but Wylie and Sammis have found that when the splits for the dependent variable get close to 50/50 (as was the case here), multiple linear regression and logistic regression yield pretty much the same results. In the software package they use, multiple regression happens to be far more flexible than logistic, changes in the fit of the model as predictors are swapped in and out are more evident, and the output variable is easier to interpret.

Where the authors find logistic regression is superior to multiple regression is in building acquisition or planned giving models where the 0/1 splits are very asymmetric.

Question: Why did you choose to train the model on contacts made instead of pledges made?

Modeling on “contact made” instead of on “pledge made” is a bit novel. But that’s the point. The sticking point for Phonathon programs these days is simply getting someone to pick up the phone. If that’s the business problem to be solved, then (as the truism in data mining goes), that’s how the model should be focused. We see the act of answering the phone as a behaviour distinct from actually making a pledge. Obviously, they are related. But someone who picks up the phone this year and says “no” is still a better prospect in the long run than someone who never answers the call. A truly full-bodied segmentation for Phonathon would score prospects on both propensity to answer the phone and propensity to give — perhaps in a matrix, or using a multiplied score composed of both components.

Question: I don’t understand how you decided which years to include in the class year deciles. Was it only dividing into equal portions? That doesn’t seem right.

Yes, all the alumni in the sample were divided into ten roughly equal groups (deciles) in order by class year. There was no need to make a decision about whether to include a particular year in one decile or the other: The stats software made that determination simply by making the ten groups as equal as possible.

The point of that exercise was to see whether there was any general (linear) trend related to the age of alumni. In the study, the trend was not a straight line, but it was close enough to work well in the model — in general, the likelihood of answering the phone increases with age. Dividing the class years into deciles is not strictly necessary — it was done simply to make the relationship easier to find and explain. In practice, class year (or age) would be more likely to be placed into the regression analysis as-is, not as deciles.

BUT, Peter Wylie notes that the questioner has a point. Chopping ‘class year’ into deciles might not be the best option. For example, he says, take the first decile (the oldest alums) and the tenth decile (the youngest alums): “The range for the former can easily be from 1930-1968, while the range for the latter is more likely to be 2006-2011. The old group is very heterogeneous and the young group is very homogeneous. From the standpoint of clearly seeing non-linearity in the relationship between how long people have been out of school and giving, it would be better to divide the entire group up into five-year intervals.” The numbers of alumni in the intervals will vary hugely, but it also might become more apparent that the variable will need to be transformed (by squaring or cubing perhaps) before placing it into the regression.

Another question about class year came from a reader at an institution that is only 20 years old. He wanted to know if he could even use Class Year as a predictor. Yes, he can, even if it has a restricted range — it might still yield a roughly linear trend. There is no requirement to chop it into deciles.

A final word

The authors had hoped to hear from folks who write about the annual fund all the time (but never mention data driven decision making), or from the vendors of automated calling software themselves. Both seem especially qualified to speak on this topic. But so far, nothing.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: