CoolData blog

15 July 2011

Answering questions about “How many times to keep calling”

Filed under: Annual Giving, John Sammis, Model building, Peter Wylie, Phonathon, regression — kevinmacdonell @ 8:27 am

The recent discussion paper on Phonathon call attempts by Peter Wylie and John Sammis elicited a lot of response. There were positive responses. (“Well, that’s one of the best things I’ve seen in a while. I’m a datahead. I admit it. Thank you for sharing this.”) There were also many questions, maybe even a little skepticism. I will address some of those questions today.

Question: You discuss modeling to determine the optimum number of times to call prospects, but what about the cost of calling them?

A couple of readers wanted to know why we didn’t pay any attention to the cost of solicitation, and therefore return on investment. Wouldn’t it make sense to cut off calling a segment once “profitability” reached some unacceptably low point?

I agree that cost is important. Unfortunately, cost accounting can be complicated even within the bounds of a single program, let alone compared across institutions. In my own program, money for student wages comes from one source, money for technology and software support comes from another, while regular expenses such as phone and network charges are part of my own budget. If I cannot realize efficiencies in my spending and reallocate dollars to other areas, does it makes sense to include them in my cost accounting? I’m not sure.

And is it really a matter of money? I would argue that the budget determines how many weeks of calling are possible. Therefore, the limiting factor is actually TIME. Many (most?) phone programs do little more than call as many people as possible in the time available. They call with no regard for prospects’ probability of giving (aside from favouring LYBUNTs), spreading their limited resources evenly over all prospects — that is, suboptimally.

The first step, then, is to spend more time calling prospects who are likely to answer the phone, and less time calling prospects who aren’t. ROI is important, but if you’re not segmenting properly then you’re always going to end up simultaneously giving up on high-value prospects prematurely AND hanging on to low-value prospects beyond the limit of profitability.

Wylie and Sammis’s paper provides insight into a way we might intelligently manage our programs, mainly by showing a way to focus limited resources, and more generally by encouraging us to make use of the trove of data generated by automated calling software. Savvy annual fund folks who really have a handle on costs and want to delve into ROI as well should step up and do so — we’d love to see that study. (Although, I have to say, I’m not holding my breath.)

Question: Which automated calling software did these schools use?

The data samples were obtained from three schools who use the software of a single vendor, and participants were invited via the vendor’s client listserv. The product is called CampusCall, by RuffaloCODY. Therefore the primary audience of this paper could assume that Wylie and Sammis were addressing auto dialers and not predictive dialers or manual programs. This is not an endorsement of the product — any automated calling software should provide the ability to export data suitable for analysis.

By the way, manual calling programs can also benefit from data mining. There may be less call-result data to feed back into the modeling process than there would be in an automated system, but there is no reason why modeling cannot be used to segment intelligently in a manual program.

If you have a manual program and you’re calling tens of thousands of alumni — consider automating. Seriously.

Question: What do some of these “call result” categories mean?

At the beginning of the study, all the various codes for call results were divided into two categories, ‘contact made’ and ‘contact not made’. Some readers were curious about what some of the codes meant. Here are some of the codes that have meanings which are not obvious. None of these are contacts.

  • Re-assigned: The phone number has been reassigned to a new person. The residents at this phone number do not know the prospect you are attempting to reach.
  • FAX2: The call went to a fax, modem or data line for the second time — this code removes the number from more calling.
  • Hung up: This is technically a contact, but so often the caller doesn’t know if the prospect answered (or someone else in the household), and often the phone is hung up before the caller can introduce him/herself, in which case the encounter doesn’t meet the definition of a contact, which is an actual conversation with the prospect. So we didn’t count these as contacts.
  • Call back2: The prospect or someone else in the household asks to be called back some other time, but if this was the last result code, no future attempt was made. Not a contact.
  • NAO: Not Available One Hour. The prospect can’t come to the phone, call back in an hour — but obviously the callback didn’t happen, because NAO is still the last result.

Question: Why did you include disconnects and wrong numbers in your analysis? Wouldn’t you stop calling them (presumably after the first attempt), regardless of what their model score was? A controlled experiment would seem to call for leaving them out, and your results might be less impressive if you did so.

Good point. When a phone number proves invalid (as opposed to simply going to an answering machine or ringing without an answer), there’s no judgement possible about whether to place one more call to that number. Regardless of the affinity score, you’re done with that alum.

If we conducted a new study, perhaps we would exclude bad phone numbers. It’s my opinion that rerunning the analysis would be more of a refinement on what we’ve learned here, rather than uncovering something new. I think it’s up to the people who use this data in their programs to take this new idea and mine their own data in the light of it — and yes, refine it as well.

This was not a controlled experiment, by the way. This was a data-mining exploration which revealed a useful insight which, the authors hope, will lead to others digging into their own call centre data. True controlled experiments are hard to do — but wouldn’t it be great if fundraisers would collaborate with the experts in statistics and experimental design teaching on their own campuses?

Question: What modeling methods did you use? Did you compare models?

The paper made reference to multiple linear regression, which implies that the dependent variable is continuous. The reader wanted to know if the modeling method was actually logistic regression, or if two or more models were created and compared against a holdout sample.

The outcome variable was in fact a binary variable, “contact made”. Every prospect could have only two states (contacted / not contacted), because each person can be contacted only once. The result of a contact might be a pledge, no pledge, maybe, or “do not call” — but in any case, the result is binary.

(Only one model was created and there was no validation set, because this was more of an exploration to discover whether doing so could yield a model with practical uses, rather than a model built to be employed in a program.)

Although the DV was binary, the authors used multiple regression. A comparison of the two methods would be interesting, but Wylie and Sammis have found that when the splits for the dependent variable get close to 50/50 (as was the case here), multiple linear regression and logistic regression yield pretty much the same results. In the software package they use, multiple regression happens to be far more flexible than logistic, changes in the fit of the model as predictors are swapped in and out are more evident, and the output variable is easier to interpret.

Where the authors find logistic regression is superior to multiple regression is in building acquisition or planned giving models where the 0/1 splits are very asymmetric.

Question: Why did you choose to train the model on contacts made instead of pledges made?

Modeling on “contact made” instead of on “pledge made” is a bit novel. But that’s the point. The sticking point for Phonathon programs these days is simply getting someone to pick up the phone. If that’s the business problem to be solved, then (as the truism in data mining goes), that’s how the model should be focused. We see the act of answering the phone as a behaviour distinct from actually making a pledge. Obviously, they are related. But someone who picks up the phone this year and says “no” is still a better prospect in the long run than someone who never answers the call. A truly full-bodied segmentation for Phonathon would score prospects on both propensity to answer the phone and propensity to give — perhaps in a matrix, or using a multiplied score composed of both components.

Question: I don’t understand how you decided which years to include in the class year deciles. Was it only dividing into equal portions? That doesn’t seem right.

Yes, all the alumni in the sample were divided into ten roughly equal groups (deciles) in order by class year. There was no need to make a decision about whether to include a particular year in one decile or the other: The stats software made that determination simply by making the ten groups as equal as possible.

The point of that exercise was to see whether there was any general (linear) trend related to the age of alumni. In the study, the trend was not a straight line, but it was close enough to work well in the model — in general, the likelihood of answering the phone increases with age. Dividing the class years into deciles is not strictly necessary — it was done simply to make the relationship easier to find and explain. In practice, class year (or age) would be more likely to be placed into the regression analysis as-is, not as deciles.

BUT, Peter Wylie notes that the questioner has a point. Chopping ‘class year’ into deciles might not be the best option. For example, he says, take the first decile (the oldest alums) and the tenth decile (the youngest alums): “The range for the former can easily be from 1930-1968, while the range for the latter is more likely to be 2006-2011. The old group is very heterogeneous and the young group is very homogeneous. From the standpoint of clearly seeing non-linearity in the relationship between how long people have been out of school and giving, it would be better to divide the entire group up into five-year intervals.” The numbers of alumni in the intervals will vary hugely, but it also might become more apparent that the variable will need to be transformed (by squaring or cubing perhaps) before placing it into the regression.

Another question about class year came from a reader at an institution that is only 20 years old. He wanted to know if he could even use Class Year as a predictor. Yes, he can, even if it has a restricted range — it might still yield a roughly linear trend. There is no requirement to chop it into deciles.

A final word

The authors had hoped to hear from folks who write about the annual fund all the time (but never mention data driven decision making), or from the vendors of automated calling software themselves. Both seem especially qualified to speak on this topic. But so far, nothing.

21 June 2011

How many times to keep calling?

Guest post by Peter Wylie and John Sammis

(Click to download a printer-friendly .PDF version here: NUMBER OF ATTEMPTS 050411)

Since Kevin MacDonell took over the phonathon at Dalhousie University, he and I have had a number of discussions about the call center and how it works. I’ve learned a lot from these discussions, especially because Kevin often raises intriguing questions about how data analysis can make for a more efficient and productive calling process.

One of the questions he’s concerned with is the number of call attempts it’s worth making to a given alum. That is, he’s asking, “How many attempts should my callers make before they ‘make contact’ with an alum and either get a pledge or some other voice-to-voice response – or they give up and stop calling?”

Last January Kevin was able to gather some calling data from several schools that may, among other things, offer the beginnings of a methodology for answering this question. What we’d like to do in this piece is walk you through a technique we’ve tried, and we’d like to ask you to send us some reactions to what we’ve done.

Here’s what we’ll cover:

  1. How we decided whether contact was made (or not) with 41,801 alums who were recently called by the school we used for this exercise.
  2. Our comments on the percentage of contacts made and the pledge money raised for each of eight categories of attempts: 1, 2, 3, 4, 5, 6, 7, and 8 or more.
  3. How we built an experimental predictive model for the likelihood of making contact with a given alum.
  4. How we used that model to see when it might (and might not) make sense to keep calling an alum.

Deciding Whether Contact Was Made

            John Sammis and I do tons of analyses on alumni databases, but we’re nowhere near as familiar with call center data as Kevin is. So I asked him to take a look at the table you see below that shows the result of the last call made to almost 42,000 alums. Then I asked, “Kevin, which of these results would you classify as contact made?”

Table 1: Frequency Percentage Distribution for Results of Last Call Made to 41,801 Alums

He said he’d go with these categories:

  • ALREADY PLEDGED
  • NO PLEDGE
  • NO SOLICIT
  • REMOVE LIST
  • SPEC PLDG (i.e., Specified Pledge)
  • UNSP PLDG  (i.e., Unspecified Pledge)
  • DO NOT CALL

Kevin’s reasoning was that, with each of these categories, there was a final “voice to voice” discussion between the caller and the alum. Sometimes this discussion had a pretty negative conclusion. If the alum says “do not call” or “remove from list” (1.13% and 0.10% respectively), that’s not great. “No pledge” (29.72%) and “unspecified pledge” (4.15%) are not so hot either, but at least they leave the door open for the future. “Already pledged” (1.06%)? What can you say to that one? “And which decade was that, sir?”

Lame humor aside, the point is that Kevin feels (and I agree), that, for this school, these categories meet the criterion of “contact made.” The others do not.

Our Comments on Percentage Contact Made and Pledge Money Raised for Each of Eight Categories of Attempts

            Let’s go back to the title of this piece: “How Many Times to Keep Calling?” Maybe the simplest way to decide this question is to look at the contact rate as well as the pledge rate by attempt. Why not? So that’s what we did. You can see the results in Table 2 and Figure 1 and Table 3 and Figure 2.

Table 2: Number of Contacts Made and Percentage Contact Made For Each of Eight Categories of Attempts

Table 3: Total pledge dollars and mean pledge dollars received for each of eight categories of attempts

 We’ve taken a hard look at both these tables and figures, and we’ve concluded that they don’t really offer helpful guidelines for deciding when to stop calling at this school. Why? We don’t see a definitive number of attempts where it would make sense to stop.  To get specific, let’s go over the attempts:

  • 1st attempt: This attempt clearly yielded the most alums contacted (6,023) and the most dollars pledged ($79,316). However, stopping here would make little sense if only for the fact that the attempt yielded only a third of the $230,526 that would eventually be raised.
  • 2nd attempt: Should we stop here? Well, $49,385 was raised, and the contact rate has now jumped from about 50% to over 60%. We’d say keep going.
  • 3rd attempt: How about here? Over $30,000 raised and the contact rate has jumped even a bit higher. We’re not stopping.
  • 4th attempt: Here things start to go downhill a bit. The contact rate has fallen to about 43% and the total pledges raised have fallen below $20,000. However, if we stop here, we’ll be leaving more money on the table.
  • 5th attempt through 8 or more attempts: What can we say? Clearly the contact rates are not great for these attempts; they never get above the 40% level. Still, money for pledges continues to come in – over $50,000.

Even before we looked at the attempts data, we were convinced that the right question was not: “How many call attempts should be made before callers stop?” The right question was: “How many call attempts should be made with what alums?” In other words, with some alums it makes sense to keep calling until you reach them and have a chance to ask for a pledge. With others, that’s not a good strategy. In fact, it’s a waste of time and energy and money.

So, how do you identify those alums who should be called a lot and those who shouldn’t?

How We Built an Experimental Predictive Model for the Likelihood of Making Contact with a Given Alum

            This was Kevin’s idea. Being a strong believer in data-driven decision making, he firmly believed it would be possible to build a predictive model for making contact with alums. The trick would be finding the right predictors.

Now we’re at a point in the paper where, if we’re not careful, we risk confusing you more than enlightening you. The concept of model building is simple. The problem is that constructing a model can get very technical; that’s where the confusing stuff creeps in.

So we’ll stay away from the technical side of the process and just try to cover the highpoints. For each of the 41,801 alumni included in this study we amassed data on the following variables:

  • Email (whether or not the alum had an email addressed listed in the database)
  • Lifetime hard credit dollars given to the school
  • Preferred class year
  • Year of last gift made over the phone (if one was ever made)
  • Marital status missing (whether or not there was no marital code whatsoever for the alum in the marital status field)
  • Event Attendance (whether or not the alum had ever attended an event since graduation)

With these variables we used a technique called multiple regression to combine the variables into a score that could be used to predict an alum’s likelihood of being contacted by a caller. Because multiple regression is hard to get one’s arms around, we won’t try to explain that part of what we did. We’ll just ask you to trust us that it worked pretty well.

What we will do is show you the relationship between three of the above variables and whether or not contact was made with an alum. This will give you a sense of why we included them as predictors in the model.

We’ll start with lifetime giving. Table 4 and Figure 3 show that as lifetime giving goes up, the likelihood of making contact with an alum also goes up. Notice that callers are more than twice as likely to make contact with alums who have given $120 or more lifetime (75.4%) than they are to make contact with alums whose lifetime giving is zero (34.9%).

Table 4: Number of Contacts Made and Percentage Contact Made for Three Levels of Lifetime Giving

How about Preferred Class Year? The relationship between this variable and contact rate is a bit complicated. You’ll see in Table 5 that we’ve divided class year into ten roughly equal size groups called “deciles.” The first decile includes alums whose preferred class year goes from 1964 to 1978. The second decile includes alums whose preferred class year goes from 1979 to 1985. The tenth decile includes alums whose preferred class year goes from 2008 to 2010.

A look at Figure 4 shows that contact rate is highest with the older alums and then gradually falls off as the class years get more recent. However, the rate rises a bit with the most recent alums. Without going into boring and confusing detail, we can tell you that we’re able to use this less than straight line relationship in building our model.

 

Table 5: Percentage Contact Made by Class Year Decile

The third variable we’ll look at is Event Attendance. Table 6 and Figure 5 show that, although relatively few alums (2,211) attended an event versus those who did not (35,590), the contact rate was considerably higher for the event attenders than the non-attenders: 58.3% versus 41.4%.

Table 6: Percentage Contact Made by Event Attendance

The predictive model we built generated a very granular score for each of the 41,801 alums in the study. To make it easier to see how these scores looked and worked, we collapsed the alums into ten roughly equal size groups (called deciles) based on the scores. The higher the decile the better the scores. (These deciles are, of course, different from the deciles we talked about for Preferred Class Year.)

Shortly we’ll talk about how we used these decile scores as a possible method for deciding when to stop calling. But first, let’s look at how these scores are related to both contact rate and pledging. Table 7 and Figure 6 deal with contact rate.

Table 7: Number of Contacts Made and Percentage Contact Made, by Contact Score Decile

Clearly, there is a strong relationship between the scores and whether contact was made. Maybe the most striking aspect of these displays is the contrast between contact rate for alums in the 10th decile and that for those in the first decile: 79.9% versus 19.2%. In practical terms, this means that, over time in this school, your callers are going to make contact with only one in every five alums in the first decile. But in the 10th decile? They should make contact with four in every five alums.

How about pledge rates?  We didn’t build this model to predict pledge rates. However, look at Table 8 and Figure 7. Notice the striking differences between the lower and upper deciles in terms of total dollars pledged. For example, we can compare the total pledge dollars received for the bottom 20% of alums called (deciles 1 and 2) and the top 20% of alums called (deciles 9 and 10): about $2,700 versus almost $200,000.

Table 8: Total Pledge Dollars and Mean Pledge Dollars Received by Contact Score Decile

How We Used the Model to See When It Might (And Might Not) Make Sense to Keep Calling an Alum

In this section we have a lot of tables and figures for you to look at. Specifically, you’ll see:

  • Both the number of contacts made and the contact rate by decile score level for each of the first six attempts. (We decided to cut things off at the sixth attempt for reasons we think you’ll find obvious.)
  • A table that shows the total pledge dollars raised for each attempt by decile score level.

Looked at from one perspective, there is a huge amount of information to absorb in all this. Looked at from another perspective, we believe there are a few obvious facts that emerge.

Go ahead and browse through the tables and figures for each of the six attempts. After you finish doing that, we’ll tell you what we see.

The First Attempt

Table 9: Number of Contacts Made and Percentage Contact Made, by Contact Score Decile for the First Attempt

The Second Attempt

Table 10: Number of Contacts Made and Percentage Contact Made, by Contact Score Decile for the Second Attempt

The Third Attempt

Table 11: Number of Contacts Made and Percentage Contact Made by Contact Score Decile for the Third Attempt

The Fourth Attempt

Table 12: Number of Contacts Made and Percentage Contact Made by Contact Score Decile for the Fourth Attempt

The Fifth Attempt

Table 13: Number of Contacts Made and Percentage Contact Made by Contact Score Decile for the Fifth Attempt

The Sixth Attempt

Table 14: Number of Contacts Made and Percentage Contact Made by Contact Score Decile for the Sixth Attempt

This is what we see:

  • For each of the six attempts, the contact rate increases as the score decile increases. There are some bumps and inconsistencies along the way (see Figure 10, for example), but this is clearly the overall pattern for each of the attempts.
  • For all the attempts, the contact rate for the lowest 20% of scores (deciles 1 and 2) is always substantially lower than the contact rate for the highest 20% of scores (deciles 9 and 10).
  • Once we reach the sixth attempt, the contact rates fall off dramatically for all but the tenth decile.

Now take a look at Table 15 that shows the total pledge money raised for each attempt (including the seventh attempt and eight or more attempts) by score decile. You can also look at Table 16 which shows the same information but with the amounts exceeding $1,000 highlighted in red.

Table 15: Total Pledge Dollars Raised In Each Attempt by Contact Score Decile

Table 16: Total Pledge Dollars Raised In Each Attempt by Contact Score Decile with Pledge Amounts Greater Than $1,000 Highlighted In Red

We could talk about these two tables in some detail, but we’d rather just say, “Wow!”

Some Concluding Remarks

            We began this paper by saying that we wanted to introduce what might be the beginnings of a methodology for answering the question: “How many attempts should my callers make before they ‘make contact’ with an alum and either get a pledge or some other voice to voice response – or they give up and stop calling?”

We also said we’d like to walk you through a technique we’ve tried, and we’d like to ask you to send us some reactions to what we’ve done. So, if you’re willing, we’d really appreciate your getting back to us with some feedback on what we’ve done here.

Specifically, you might tell us how much you agree or disagree with these assertions:

  • There is no across-the-board number of attempts that you should apply in your program, or even to any segment in your program; the number of attempts you make to reach an alum very much depends on who that alum is.
  • There are some alums who should be called and called because you will eventually reach them and (probably) receive a pledge from them. There are other alums who should be called once, or not at all.
  • If the school we used in this paper is at all representative of other schools that do calling, all across North America huge amounts of time and money are wasted trying to reach alums with whom contact will never be made nor will any pledges be raised.
  • Anyone who is at a high level of decision making regarding the annual fund (whether inside the institution or a vendor) should be leading the charge for the kind of data analysis shown in this paper. If they’re not, someone needs to have a polite little chat with them.

We look forward to getting your comments. (Comment below, or email Kevin MacDonell at kevin.macdonell@gmail.com.)

5 April 2011

Validation after the fact

Filed under: Model building, Phonathon, regression, Validation — Tags: , , — kevinmacdonell @ 8:11 am

Validation against a holdout sample allows us to pick the best model for predicting a behaviour of interest. (See Thoughts on model validation.) But I also like to do what I call “validation after the fact.” At the end of a fundraising period, I want to see how people who expressed that behaviour broke down by the score they’d been given.

This isn’t really validation, but if you create some charts from the results, it’s the best way to make the case to others that predictive modeling works. More importantly, doing so may provide insights into your data that will lead to improvements in your models in their next iterations.

This may be most applicable in Annual Fund, where the prospects solicited for a gift may come from a wide range of scores, allowing room for comparison. But my general rule is to compare each score level by ratios, not counts. For example, if I wanted to compare Phonathon prospects by propensity score, I would compare the percentage (ratio) of each score group contacted who made a pledge or gift, not the number of prospects who did so. Why? Because if I actually used the scores in solicitation, higher-scoring prospects will have received more solicitation attempts on average. I want results to show differences among scores, not among levels of intensity of solicitation.

So when the calling season ended recently, I evaluated my Phonathon model’s performance, but I didn’t study the one model in isolation: I compared it with a model that I initially rejected last year.

It sounds like I’m second-guessing myself. Didn’t I pick the very best model at the time? Yes, but … I would expect my chosen model to do the best job overall, but perhaps not for certain subgroups — donor types, degree types, or new grads. Each of these strata might have been better described by an alternative model. A year of actual results of fundraising gives me what I didn’t have last year: the largest validation sample possible.

My after-the-fact comparison was between a binary logistic regression model which I had rejected, and the multiple linear regression model which I actually used in Phonathon segmentation. As it turned out, the multiple linear regression model did prove the winner in most scenarios, which was reassuring. I will spare you numerous comparison charts, but I will show you one comparison where the rejected model emerged as superior.

Eight percent of never-donor new grads who were contacted made a pledge. (My definition of a new grad was any alum who graduated in 2008, 2009, or 2010.) The two charts below show how these first-time donors broke down by how they were scored in each model. Due to sparse data, I have left out score level 10.

Have a look, and then read what I’ve got to say.

Neither model did a fantastic job, but I think you’d agree that predicting participation for new grads who have never given before is not the easiest thing to do. In general, I am pleased to see that the higher end of the score spectrum delivered slightly higher rates of participation. I might not have been able to ask for more.

The charts appear similar at first glance, but look at the scale of the Y-axis: In the multiple linear regression model, the highest-scoring group (9, in this case) had a participation rate of only 12%, and strangely, the 6th decile had about the same rate. In the binary logistic regression model, however, the top scoring group reached above 16% participation, and no one else could touch them. The number of contacted new grads who scored 9 is roughly equal between the models, so it’s not a result based on relatively sparse data. The BLR model just did a better job.

There is something significantly different about either new grads, or about never-donors whom we wish to acquire as donors, or both. In fact, I think it’s both. Recall that I left the 10s out of the charts due to sparse data — very few young alumni can aspire to rank up there with older alumni using common measures of affinity. As well, when the dependent variable is Lifetime Giving, as opposed to a binary donor/nondonor state, young alumni are nearly invisible to the model, as they are almost by definition non-donors or at most fledgling donors.

My next logical step is a model dedicated solely to predicting acquisition among younger alumni. But my general point here is that digging up old alternative models and slicing up the pool of solicited prospects for patterns “after the fact” can lead to new insights and improvements.

18 January 2011

Time on the call and how much the alum pledges

Filed under: Annual Giving, John Sammis, Peter Wylie, Phonathon — Tags: — kevinmacdonell @ 10:24 am

By Peter B. Wylie, John Sammis, and Kevin MacDonell

(Download a PDF version of this paper here: Time on the call-Wylie-Sammis-MacDonell-Jan2011)

It’s a Wednesday evening and you’re in the call center watching your students on the phone with alums. Young people talking to older people who went to the same school they did. Lots of drama. Smiles. Frowns. Glee. Frustration. Side conversations between callers that you hope your alums will never overhear. Maybe you’re looking forward to going home and relaxing, but you’re not bored. Too much energy in the room for that.

One of the things you notice, as you so often have, is the length of the calls. Some end quickly. Some go on for awhile. And some last a goodly amount of time, maybe longer than you’re comfortable with. Perhaps you ask yourself, “Is all that time worth the effort in terms of getting alums to pledge and pledge a lot?”

As much as we’d like to, we can’t offer a definitive answer to that question. However … we can offer some findings we think are intriguing.  Admittedly, the data we have is from only one school. There’s no way we can responsibly generalize these findings to other schools. But we want to put them out there for you to think about and (we hope) test out at your own institution.

Here’s the flow of the paper:

  • The basic data we looked at
  • The questions we asked
  • The answers we found
  • Some of the implications

The Basic Data We Looked At

We were quickly reminded of something when we launched this study: If you run a call center, you have oodles of electronically stored data at your disposal. Of course, given the pressures and constraints of your job, you’re not going to have the time and leisure to forage through all that data and analyze it. We appreciate that. Nonetheless, the data is there; it’s hanging around waiting for the day when somebody can really dig into it.

This data came from a university with about 100,000 living alumni, and was collected during one fall term of calling by 26 student employees working in evening shifts of 8 to 12 people each.

In this study, we explored a tiny portion of the data that was available to us:

  • The results of the last call made to 4,785 alumni: “ALREADY PLEDGED,” “DO NOT CALL,” “NO PLEDGE,” “REMOVE FROM LIST,” “SPECIFIED PLEDGE” (including the dollar amount), and “UNSPECIFIED PLEDGE.”
  • The time (in seconds) spent on each of these 4,785 calls
  • The amount of the last gift (if there had been one) made by the alum

The Questions We Asked

These are the basic questions we tried to answer:

  • How much time was spent on these almost 5,000 calls?
  • What was the relationship between time spent on these calls and dollars pledged?
  • What was the relationship between time spent on these calls and pledge rate?
  • What was the relationship between time spent on these calls and whether or not alums gave more than their last gift?

The Answers We Found

In this section, we’ve got a bunch of tables and graphs for you to look at. We’ve done our best to make them clear and to offer our best interpretation of the findings.  If questions remain, please do not hesitate to let us know with a phone call or e-mail.

How much time was spent on the calls?

Take a look at Table 1. After you’ve had a little time to digest it, we’ll tell you what we see there.


Let’s start with the first column in the table called “Time Interval.” Notice that there are 20 of them, and that they each contain about 5% of the calls made. If you scan over to the two columns on the right (“Minimum Seconds” and “Maximum Seconds”), here’s what emerges:

  • A good half (Intervals 1-10) of the calls lasted less than three minutes (179 seconds).
  • 80% of the calls (Intervals 1-16) lasted less than five minutes (283 seconds).
  • 5% of the calls (Interval 20) lasted somewhere between seven and a half minutes (456 seconds) on up to well over half an hour (2,115 seconds).

Because we have not spent much time analyzing call center data, we simply don’t know how typical these time data are. We’d love to know how they compare to your own call center data.

What was the relationship between time spent on these calls and dollars pledged?

Okay, now things start to get a little more interesting. Take a look at Table 2 and Figure 1. (Click Fig. 1 for full-size version.)


Here’s what we see:

  • The big news from both the table and the chart is that, as the calls get longer, there is a substantial rise in the number of pledges made per attempt, as expressed in average dollars per call attempt. There are some blips, some ups and downs, but the trend is undeniable.
  • More specifically, let’s compare what came in dollar-wise from the shortest 25% of calls (Intervals 1-5) and the longest 25% of calls (Intervals 16-20). It’s $3,910 versus $62,491. Any way you do the math, that’s a big difference.

What was the relationship between time spent on these calls and pledge rate?

Look at Table 3 and Figure 2. (Click on Fig. 2 for full-size version.)


We see the same pattern here as we did with pledge dollars received. If anything, the relationship is a bit stronger. We’ll do the same comparison we did previously, except this time we’ll be looking at pledges received as opposed to dollars received. For the shortest 25% of calls (Intervals 1-5): 37 pledges. For the longest 25% of calls (Intervals 16-20): 678 pledges. This is a big, big difference.

What was the relationship between time spent on these calls and whether or not alums gave more than their last gift?

We think this is an interesting question. After all, I think we all want alums to give more this time than they gave the last time, whether we reaching out to them by a call or a letter or a site visit or whatever.

Same suggestion: Look at the table and the figure. Then we’ll tell you what we think. (Click on Fig. 3 for full-size version.)


No doubt about it. The longer the call lasted, the more likely alums were to give more this time than they had the last time.

Some of the Implications of What We Found

The first thing we’d like to say is more along the lines of a caveat than an implication. And the caveat is that none of us should fall into the trap of making a causal inference here. For example, we can’t conclude that encouraging callers to spend more time on the phone with alums who pick up the phone is going to increase pledge participation and pledge revenue. Spending more time on the phone with these folk may pay off, and it may not.

Why do we say that? Because time spent on the phone may not be the causal factor at work here. What may be making the difference are factors related to how much time the caller spends on the phone with the alum. It could be that warmer and chattier callers are better at raising money with alums who are also warm and chatty – and it’s that combination of traits that leads to longer calls, and it’s the chemistry between the two people that produces the pledge. We don’t know.

What we do know is that not analyzing at least some of that call data (that is so very much at your disposal) is a bad thing. Somebody in your shop or related to your shop should be working the heck out of that data. And you vendors who support call center data? We have to shake our finger at you. You have the resources to do this type of analysis. Let’s get the lead out!!

(Thank you to Devin T. Mathias, Consultant with Marts & Lundy, Allison Bass, currently on professional hiatus and formerly of Seattle Country Day School and The Bush School in Seattle, and Sharon R. Lonthair, Managing Director of Development and Alumni Relations, Rochester Institute of Technology, who all graciously agreed to review this paper and provided helpful advice.)

2 December 2010

Call attempt limits? You need propensity scores

Filed under: Annual Giving, Phonathon, Predictive scores — Tags: , , — kevinmacdonell @ 4:51 pm

A couple of weeks ago I shared some early results from our calling program that showed how very high-scoring alumni (for propensity to give by phone) can be counted on to give, and give generously, even after multiple attempts to reach them. If they have a high score, keep calling them! Yes, contact rates will decline, for sure. But these prospects are still likely to give if you can get them on the phone, making the extra effort worthwhile.

For the other three-quarters of your prospects, it’s a different story. You may still want to call them, but keeping those phones ringing all year long is not going to pay off, even if you have the luxury of being able to do so.

This is ground I’ve already covered, but I think it bears repeating, and I’ve created some charts that illustrate the point in a different way. Have a look at this chart, which shows pledge percentage rates for the 6th, 7th, 8th, 9th and 10th decile score, at four stages of call attempts:

This chart is based on data from more than 6,600 phone conversations. How are we to interpret it? Let’s start with the top line, in blue, which represents prospects in the top 10% (decile) of alumni for propensity to give by phone, as determined by the predictive model:

  • Almost 38% of 10th-decile alumni who were contacted on the very first call attempt made a pledge.
  • Moving to the next dot on the blue line, we see that almost 37% of the 10th-decile alumni who were contacted on the 2nd or 3rd attempt made a pledge.
  • The pledge rate slips a little more, to 36%, if the prospect picked up the phone on attempts 4 through 7.
  • And finally, almost 26% of them pledged if it took more than 7 attempts to reach them.

That’s the first line. The other lines take different paths. The 9s and 8s start much lower than the 10s, but pledge percentages actually rise with the number of call attempts. They will fall back to earth — just not yet! As for the lower deciles, the 7s and 6s, they start relatively low and dwindle to zero.

So what does all this tell me? I am less interested in how each decile ranks at the start of calling (one or two attempts), because it’s no surprise to me that the 10th decile gives at twice the rate as the 9th decile, and that pledge rates fall with each step down in the score. I’ve seen that before.

What really interests me is what happens when we’ve made many repeated attempts to call. That the 8s and 9s have pledge rates that increase with the number of call attempts is pretty strange stuff, but the fact is: 26 alumni with a score of 9 made a pledge only after we called them 8 or 9 or maybe 15 times.

Whether it’s worth it to make that many call attempts is up to you. It depends on contact rates, and what it costs to make all those calls. But one thing is certain: If I’m going to call repeatedly, I’d better be calling the top three deciles, because if I keep on flogging the segments with a score of 6, I’m not going to do very well.

So what about contact rates?

Here’s another chart that shows what percentage of each score decile’s prospects have been successfully reached at the same four levels of call attempts. (Click on chart for full size.)

What does it mean? Compare the lowest decile called so far (Decile 6) with the highest decile (10). About 14% of 6s answered the phone on the first try, compared with about 19% of the 10s. That’s not a big difference: In fact, contact rates are similar across the scores for the first attempt. But the similarity ends there. After the first attempt, the lower scoring alumni have steadily decreasing rates of contact. The same is true of the higher-scoring alumni, but the difference is that some of them are still answering their phones on the 8th call. More than 4% of 10s were reached on the 8th call or greater.

The bottom line is, the propensity score is your biggest asset in setting appropriate call attempt limits. Yes, Renewal prospects are more likely to give than Acquisition prospects. But that’s not enough to go by. Are you going to call every last Renewal prospect before thinking about acquiring new donors? I wouldn’t recommend it — not if you care about long-term growth and not just this year’s totals. And because contact rates decline as attempts increase (regardless of score), you’re going to end up making a LOT of phone calls to find those gifts that will make up your goal.

My earlier post on the same subject is here. I am spending a lot of time on this, because I don’t see any of this being written about by the well-known experts in Phonathon fundraising. Why that is, I do not know.

23 November 2010

Mine your hidden call centre data

Filed under: Annual Giving, Best practices, Phonathon, Predictor variables — Tags: , , , — kevinmacdonell @ 1:33 pm

(Image used by Creative Commons license. Click image for source.)

One day earlier this fall on my walk to the office, I passed a young woman bundled up in toque and sweater and sitting in a foldup chair at an intersection. She was holding a clipboard, and as I passed by, I heard a click from somewhere in that bundle. She was counting. Whether it was cars going through the intersection, or whether I myself had just been counted, I don’t know. I could have asked her, but I knew what she was doing: She was collecting data.

All those clicks might be used by a local business or charity looking for the best location and time to solicit passersby, or they might find their way into GIS and statistical analysis and be used by city planners working on traffic control issues. Locating business franchises, planning for urban disasters, optimizing emergency services — all sort of activities are based on the mundane activity of counting.

This week I’m thinking about a different type of click: the reams of data that flow from Phonathon. If your institution is fortunate enough to have a call centre that is automated, you may be sitting on a wealth of data that never makes it into the institutional database. (Thus, “hidden”.) In our program, only a few things are loaded into Banner from CampusCall: Address updates, employment updates, any requested contact restrictions, and the pledges themselves. The rest stays behind in the Oracle database that runs the calling software, and I am only now pulling out some interesting bits which I intend to analyze over the coming days.

Call centre data is not just about the Phonathon program. Gathered from many thousands of interactions across a broad swath of your constituency, this data contains clues that will potentially inform any model you create, including giving by mail, Planned Giving, even major gifts.

What data am I looking for? So far, here’s what I have, plus some early intuition about what it might tell me.

  • ID: Naturally, I’ll need prospect IDs in order to match my data up, both across calling projects and in my predictive models themselves.
  • Last result code: The last call result coded by the student caller (No Pledge, Answering Machine, etc.) There are many codes, and I will discuss those in more detail in a future post.
  • Day call: People who tell us they’d rather be called back during the day (at the office, in many cases), are probably statistically different from the rest.
  • Number of attempts: This is the number of times a prospect was called before we finally reached them or gave up. I suspect high call attempt numbers are associated with lower affinity, although that remains to be seen. It’s probably more specific than that — high attempt numbers make a person a relatively poor phone prospect, but may cause them to score better in a mail-solicitation model.
  • Refusal reason: The reason given by the prospect for not making a pledge, usually chosen by the Phonathon employee from a drop-down menu of the most common responses. Refusal reasons are not always well-tracked, but they’re potentially useful for designing strategies aimed at overcoming objections. I’ve observed in the past that certain refusal reasons are actually predictive of giving (by mail).
  • Talk time: The length of the call, in seconds. People who pledge are on the phone longer, of course, but not every long call results in a pledge. I think of longer calls as a sign of successful rapport-building.

There are other important types of information: Address and employment updates, method of payment and so on — but these are all coded in our database and I do not need to extract them from the Phonathon software. My focus today is on hidden data — the data that gets left behind.

In CampusCall, prospects are loaded into giant batches called “projects”. Usually there is only one project per term, but multiple projects can be run at once. Each one is like its own separate database. I have data for ten projects conducted from 2007 to the present. I had to extract data for each project separately, and then match all the records up by ID in order to create one huge file of historical calling results. The total number of records in all the extracts was 189,927; when matched up they represent 56,216 unique IDs. Yum!

Where I go from here will be discussed in future posts. I need to put some thought into the variables I will create. For example, will I simply add up all call attempts into a single variable called “Attempts”, or should I calculate an average number of attempts, keeping in mind that some prospects were called in some projects and not others?

Until I figure these things out, here’s a final thought for today. If your job is handling data, then it’s also your job to understand where that data comes from and how it is gathered. Stick your nose into other peoples’ business from time to time, and get involved in the establishment of new processes that will pay off in good data down the road. Go to the person who runs your Phonathon and ask him or her if refusal reasons are being tracked. (In an automated system, it’s not that hard.) If you ARE the person running the Phonathon, make sure your callers are trained to select the right code for the right result.

In other words, it all starts with that young person bundled against the cold: The point at which data is collected. What happens here determines whether the data is good, usable, reliable. Without this person and her clicker, not much else is possible.

P.S. If you’re interested in analyzing your call centre data, have a read of this white paper by Peter Wylie: What Makes a Call Successful.

« Newer PostsOlder Posts »

Create a free website or blog at WordPress.com.