CoolData blog

9 October 2015

Ready for a sobering look at your last five years of alumni giving?

Guest post by Peter B. Wylie and John Sammis


Download this discussion paper here: Sobering Look at last 5 fiscal years of alumni giving


My good friends Wylie and Sammis are at it again, digging into the data to ask some hard questions.


This time, their analysis shines a light on a concerning fact about higher education fundraising: A small group of donors from the past are responsible for the lion’s share of recent giving.


My first reaction on reading this paper was, well, that looks about right. A school’s best current donors have probably been donors for quite some time, and alumni participation is in decline all over North America. So?


The “so” is that we talk about new donor acquisition but are we really investing in it? Do we have any clue who’s going to replace those donors from the past and address the fact that our fundraising programs are leaky boats taking on water? Is there a future in focusing nearly exclusively on current loyal donors? (Answer: Sure, if loyal donors are immortal.)


A good start would be for you to get a handle on the situation at your institution by looking at your data as Wylie and Sammis have done for the schools in their discussion paper. Download it here: Sobering Look at last 5 fiscal years of alumni giving.


9 September 2015

Prospect Research, ten years from now

Guest post by Peter B. Wylie


(This opinion piece was originally posted to the PRSPT-L Listserv.)


As many of you know, Dave Robertson decided to talk about the future of prospect research in New Orleans via a folk song. It was great. The guy’s got good pipes and plays both the harmonica and the guitar really well.


Anyway, here’s what I wrote him back in March. It goes on a bit. So if you get bored, just dump it.


It was stiflingly hot on that morning in early September of 1967 when my buddy Bunzy sat down in the amphitheater of Cornell’s med school in Manhattan. He and the rest of his first year classmates were chattering away when an older gentleman scuffled through a side door out to a podium. The room fell silent as the guy peered over his reading glasses out onto a sea of mostly male faces: “I’ll be straight with you, folks. Fifty percent of what we teach you over the next four years will be wrong. The problem is, we don’t know which fifty percent.”


I’ve often thought about the wisdom embedded in those words. The old doc was right. It is very hard for any of us to predict what anything will be like twenty years hence. Nate Silver in both his book “The Signal and the Noise” and on his immensely popular website underlines how bad highly regarded experts in most fields are at making even short range predictions.


So when Dave Robertson asked me to jot down some ideas about how prospect research will look a decade or more from now, I wanted to say, “Dave, I’ll be happy to give it a shot. But I’ll probably be as off the mark as the futurists in the early eighties. Remember those dudes? They gave us no hint whatsoever of how soon something called the internet would arrive and vastly transform our lives.”


With that caveat, I’d like to talk about two topics. The first has to do with something I’m pretty sure will happen. The second has to do with something sprinkled more with hope than certainty.


On to the first. I am increasingly convinced prospect researchers a decade or more from now will have far more information about prospects than they currently have. Frankly, I’m not enthusiastic about that possibility. Why? Privacy? Take my situation as I write down my thoughts for Dave. I’m on the island of Vieques, Puerto Rico with Linda to celebrate our fortieth wedding anniversary. We’ve been here almost two weeks. Any doggedly persistent law enforcement investigator could find out the following:


  • What flights we took to get here
  • What we paid for the tickets
  • The cost of each meal we paid for with a credit card
  • What ebooks I purchased while here
  • What shows I watched on Netflix
  • How many miles we’ve driven and where with our rental jeep
  • How happy we seemed with each other while in the field of the many security cameras, even in this rustic setting


You get the idea. Right now, I’m gonna assume that the vast majority of prospect researchers have no access to such information. More importantly, I assume their ethical compasses would steer them far away from even wanting to acquire such information.


But that’s today. March 2, 2015. How about ten years from now? Or 15 years from now, assuming I’m still able to make fog on a mirror? As it becomes easier and easier to amalgamate data about old Pete, I think all that info will be much easier to access by people willing to purchase it. That includes people who do prospect research. And if those researchers do get access to such data, it will help them enormously in finding out if I’m really the right fit for the mission of their fundraising institution. I guess that’s okay. But at my core, I don’t like the fact that they’ll be able to peek so closely into who I am and what I want in the days I have left on this wacky planet. I just don’t.


On to the second thing. Anybody who’s worked in prospect research even a little knows that the vast majority of the money raised by their organization comes from a small group of donors. If you look at university alumni databases, it’s not at all unusual to find that one tenth of one percent of the alums have given almost a half of the total current lifetime dollars. I think that situation needs to change. I think these institutions must find ways to get more involvement from the many folks who really like them and who have the wherewithal to give them big gifts.


So … how will the prospect researchers of the future play a key role in helping fundraising organizations (be they universities or general nonprofits) do a far better job of identifying and cultivating donors who have the resources and inclination to pitch in on the major giving front? I think/hope it’s gonna be in the way campaigns are run.


Right now, here’s what seems to happen. A campaign is launched with the help of a campaign consultant. A strategy is worked out whereby both the consultants and major gift officers spread out and talk to past major givers and basically say, “Hey, you all were really nice and generous to us in the last campaign. We’re enormously grateful for that. We truly are. But this time around we could use even more of your generosity. So … What can we put you down for?”


This is a gross oversimplification of what happens in campaigns. And it’s coming from a guy who doesn’t do campaign consulting. Still, I don’t think I’m too far off the mark. To change this pattern I think prospect researchers will have to be more assertive with the captains of these campaigns: The consultants, the VPs, the executives, all of whom talk so authoritatively about how things should be done and who can simultaneously be as full of crap as a Christmas goose.


These prospect researchers are going to have to put their feet down on the accelerator of data driven decision-making. In effect, they’ll need to say:


“We now have pretty damn accurate info on how wealthy a whole bunch of our younger donors are. And we have good analytics in place to ascertain which of them are most likely to step it up soon … IF we strategize how to nurture them over the long run. Right now, we’re going after the low hanging fruit that is comprised of tried and true donors. We gotta stop just doing that. Otherwise, we’re leaving way too much money on the table.”


All that I’ve been saying in this second part is not new. Not at all. Perhaps what may be a little new is what I have hinted at but not come right out and proclaimed. In the future we’ll need more prospect researchers to stand up and be outspoken to the campaign movers and shakers. To tell these big shots, politely and respectfully, that they need to start paying attention to the data. And do it in such a way that they get listened to.


That’s asking a lot of folks whose nature is often quiet, shy, and introverted. I get that. But some of them are not. Perhaps they are not as brazen as James Carvel is/was. But we need more folks like them who will stand up and say, “It’s the data, stupid!” Without yelling and without saying “stupid.”

3 September 2015

Testing associations between two categorical variables


Over the coming months I will be adapting various bits from a work in progress, a new book on building predictive models using regression, aimed at people working in nonprofit and higher education advancement.


(If you’re interested, I suggest subscribing via email — see the box to the right — to have the inside track on this project.)


In my previous post, I wrote about “Exploring associations between variables.” Today I go deeper to describe exactly how to test for an association or relationship between two variables that are categorical.


These directions are specific to the statistics software Data Desk, but apply equally to any stats package. I make frequent reference to contingency tables, which you may know better as cross-tabs or R x C tables (row by column tables).


As well, I mention a data file from “Data University.” This is a file of anonymized sample data provided to me by Peter Wylie and John Sammis, which readers of my book will be able to download and use to learn the concepts.


In your data file for Data University, you’ve got a variable for “Business phone present” and “Job title present”. You’d like to know if these two factors are related to each other. If they are completely unrelated, they are said to be independent of each other. We can analyze this in Data Desk using contingency tables. (If you don’t know why we care about dependence or independence among variables, read my previous post.)


Click once on the icon for ‘BusPhone Present’ to select it as the first variable, then Shift-click on ‘JobTitle Present’ to select it as the second variable. Under the Calc menu at the top of the screen, select Contingency Tables. The displayed result will depend on what default settings are in effect, but it might look like the table below. (Just as with frequency tables, you can play with display options from a menu available by clicking on the little triangle in the top left corner of the window.)




A contingency table gives us counts of how many cases fall into the categories that result from the intersection of two categorical variables. In this example, we have two binary variables, therefore there are four possibilities, arranged in a two-by-two matrix. The two “levels” of ‘BusPhone Present’ are labeled vertically (down the rows), and the two ‘levels’ of ‘Job Title Present’ are labeled horizontally (across the columns).


The labeling of rows and columns in contingency tables can be a bit confusing when the category names are just ones and zeroes, so here’s the same table with more descriptive labels.




If these variables are properly coded, with no missing values, each and every case in our data falls into one of the four cells of the matrix. We see for example that 2,240 cases are ‘0’ for ‘BusPhone Present’ and ‘0’ for ‘JobTitle Present’ – a large number of cases have neither piece of information coded in the database.


Looking at counts of cases in a contingency table can sometimes show cells with unusually many or unusually few cases. In this example, it seems to be fairly common that whenever Data University has a business phone number, it tends to have a job title as well, and it’s also common for both to be missing. It’s much less common for one to be present and the other not. This suggests there’s a relationship.


We’ve made a judgment call, just looking at the numbers. Fortunately the field of statistics provides some tests for making decisions more consistently and easily. If you have some training in stats you will readily grasp how we can apply these tests. For our purposes, we will rely on rules of thumb and simplified explanations rather than delve into a deep understanding of the tests themselves.


Click on the options menu for the contingency table, and add “Expected value” to the table, then click OK. This adds a new set of numbers to the table. The section at the bottom of the table, “table contents,” lists these elements in the order they appear in each cell. The first element is the count of cases that we’ve already seen. The other element is Expected Values.




“Expected values” are the counts of cases that would be expected to result if ‘BusPhone Present’ and ‘JobTltle Present’ were statistically independent of each other. In other words, if the two variables were completely unrelated, the probability that any one case would end up in a particular cell depends only on the probability that the case falls in a specified row and the probability that it falls in a specified column. In other words, we would expect the number of people with a business phone number AND who have a job title in the database to depend solely on the number of each in the sample – one doesn’t depend on the other. By this logic, the probability of a person having a business phone in the database is about the same whether or not they have a job title in the database, if it is true that one is not related to the other.


Have a quick look to compare each pair of numbers. You’ll notice that two actual counts are higher than the expected value, and two counts are lower than the expected value. If the variables were not associated with each other (independent), the actual counts would be a lot closer to the expected counts. It appears that if one variable is a zero or one, the other variable is more likely than not to have the same value – that is, we are likely to have either both or neither. Therefore, the variables do seem to be related.


Being able to compare the expected values to the observed values is helpful in identifying a pattern or relationship. But again, we are still just eyeballing the numbers and making a guess. Let’s take another step forward.


An assertion that two variables are not related is based on a concept in statistics called the null hypothesis. The null hypothesis in this case would state, “There is no relationship between Business Phone Present and Job Title Present.” In a test for independence between Business Phone and Job Title, if there is no relationship, then the null hypothesis is true. The alternative hypothesis is that the variables are in fact related – they are dependent, rather than independent.


The test for determining whether the null hypothesis is true in this case is called the chi-square test for independence. For our purposes, it is not necessary to formally state any kind of hypothesis, nor is it necessary to understand how chi-square is calculated. (Chi is pronounced “kai”.) Data Desk takes care of the calculations, and a rule of thumb for interpreting the result will suffice. (1)


Go back to the options for the contingency table. Deselect “Expected value”, and choose “Chi-square value” instead. Data Desk will then calculate the chi-square statistic. If it’s large, that is enough to reject the null hypothesis. This number is not interpreted on its own. Rather, the important statistic to watch for appears directly underneath it: p. The p-value in the table below is less than or equal to 0.0001 – a very small value.




The lowercase p stands for “probability.” A p-value of 0.0001 is the same as a 0.01% chance of something happening. We ask this question: “If there were no relationship between ‘BusPhone Present’ and “JobTitle Present’, what is the probability that you would get a result for chi-square at least as high as 1,141?” That probability is your p-value.


By convention, you need a p-value of 0.05 (i.e. 5 percent) or less to consider the probability low enough to conclude that there must, in fact, be a relationship between ‘BusPhone Present’ and ‘JobTitle Present’. (In other words, you can reject the null hypothesis.) The probability of obtaining a value of chi-square this large if the variables were independent is extremely low: less than 0.01%. The p-value in this case is very small, therefore the variables are strongly related.


We will encounter p-values again when we do regression. (2)


Given the foregoing discussion, you might be thinking that exploring relationships among variables is a complicated and subtle business. Not really. You don’t have to study expected values or formulate a null hypothesis – I introduced those things to help you understand where chi-square comes from. You only need these steps to test for presence of a relationship:


  1. Select the icons of the two variables.
  2. Create a contingency table from the Calc menu.
  3. In the table options menu, select “Chi-square value.”
  4. Regardless of the chi-square value, if the p-value is less than 0.05, the variables are not independent – they are related somehow.


You can create a new contingency table whenever you want to test two new variables, or you can simply drag a new category variable into an existing table. Just drag a new categorical variable icon on top of the name of the variable you want to replace. These names are in bold face text near the top of the table window, and will highlight when touched by the cursor during the drag. You can replace both factors at once by selecting two new variables and dragging them both into the centre of the table.


Before we move on, try dragging other binary variables into the contingency table to quickly see which combinations yield some interesting associations.


A more relevant example


Dragging variables one by one into a contingency table is a useful exploration step while evaluating potential predictors for use in a model. This implies that one of the variables must be the target (outcome) variable.


Let’s say you plan to build a predictive model to identify which alumni are more likely to make a donation at a higher level than usual, and you want to make a preliminary assessment of potential predictors so you can toss out any that don’t look promising, while keeping the others for use in the regression analysis. You can create a target variable using a lifetime giving variable that will allow you to do this analysis using contingency tables.


Working with the Data University file, create a new derived variable called ‘Big donor’ with the expression:


‘Lifetime HC’ > 999


This will create a binary variable that evaluates to ‘1’ for alumni with lifetime hard-credit giving greater than $999, and ‘0’ for everyone else.


You can use whatever dollar value you like, but $1,000 and up will work here. According to a frequency table of ‘Big donor’, 602 people have lifetime giving of $1,000 or more.


Click on the icon for ‘Big donor’ to select it and create a contingency table from the Calc menu, and change the table options to include chi-square. Now the window is ready for testing variables one by one, by dragging variables into the space that is initially labeled “Drag Variable Here.”


Chi-square is great for indicating that two variables are related, but a little more information will help you understand the nature of the relationship. When you drag in “BusPhone Present,” for example, it seems apparent that the relationship is significant, to judge from the chi-square value. One tweak will tell you something more concrete: Go back to the table options, deselect “Count” and select “Percent of column total.”




This replaces cell counts with percent values – the percent breakdown for each column. The circled values in the table above are the ones to pay attention to – they are the cases that have a ‘1’ for ‘Big donor’. Here is what we can read from this:


  • The first column of figures includes all cases for which there is no business phone in the database (0). Only 8.39% of people with no business phone are also top donors.
  • The second column includes all cases with a business phone (1) – 18.4% of people with a business phone are top donors.


People with a business phone are more than twice as likely to be top donors, and the chi-square value indicates that the relationship is significant. It might be hasty to conclude that ‘BusPhone Present’ is truly a “predictor” of high-level giving, but the association is clearly there in the data.


(These percentages total on the columns. If ‘Big donor’ were your second variable instead of your first, the matrix would be ordered in the other direction, and you would choose “Percent of row total” from the table options instead.)


Try dragging other binary variables from the Data University set into the table, replacing ‘BusPhone Present’, and observe how the percent breakdowns differ from group to group, and whether that difference is significant.


In my book-in-progress, I go on to talk about exploring categorical variables with multiple categories – not just binary variables – but that’s all for today. Again, if you’re interested in knowing more as this project progress, please subscribe to this blog via email from the box in the right sidebar.


End notes


1) If you must know, the value of chi-square is the sum of the squared standardized residuals across all cells in the table. The standardized residual describes the extent to which the observed count differs from the expected count. You can display this statistic in Data Desk along with the others, but there is no need. Another abbreviation in the table is “df”, which stands for degrees of freedom. Probability values of chi-square depend upon the degrees of freedom, which in turn is related to the number of rows and columns in the table.


2) Use and misuse of p-values is a hot topic in statistics, especially in connection with academic publishing, where putting too much stock in p-values and the rather arbitrary 0.05 threshold causes a great deal of angst. As predictive modelers we need to keep these controversies in perspective: We are not testing the effects of new medical treatments nor are we seeking to publish in a scientific journal. We are making use of “good-enough” tools that will produce a useful result. That said, applying the 0.05 threshold without judgment and common sense means that we will detect “relationships” that are not real – potentially about 5% of the time!



26 August 2015

Exploring associations between variables

Filed under: Book, CoolData, Predictor variables — Tags: , , , — kevinmacdonell @ 6:57 pm


CoolData has been quiet over the summer, mainly because I’ve been busy writing another book. (Fine weather has a bit to do with it, too.) The book will be for nonprofit and higher education advancement professionals interested in learning how to use multiple regression to build predictive models. Over the next few months, I will adapt various bits from the work-in-progress as individual posts here on CoolData.


I’ll have more to say about the book later, so if you’re interested, I suggest subscribing via email (see the box to the right) to have the inside track on this project. (And if you aren’t familiar with the previous book, co-written with Peter Wylie, then have a look here.)


A generous chunk of the book is about the specifics of getting your hands dirty with cleaning up your messy data, transforming it to make it suitable for regression analysis, and exploring it for interesting patterns that can strengthen a predictive model.


When you import a data set into Data Desk or other statistics package, you are looking at more than just a jumble of variables. All these variables are in a relation; they are linked by patterns. Some variables are strongly associated with each other, others have weaker associations, and some are hardly related to each other at all.


What is meant by “association”? A classic example is a data set of children’s weights, heights, and ages. Older children tend to weigh more and be taller than younger children. Heavier children tend to be older and taller than younger children. We say that there is an association between age and weight, between height and weight, and between age and height.


Another example: Alumni who are bigger donors tend to attend more reunion events than alumni who give more modestly or don’t give at all. Or put the other way, alumni who attend more events tend to give more than alumni who attend fewer or no events. There is an association between giving and attending events.


This sounds simple enough — even obvious. The powerful consequence of these truths is that if we know the value of one variable, we can make a guess at the value of another, as long as the association is valid. So if we know a child’s weight and height, we can make a good guess of his or her age. If we know a child’s height, we can guess weight. If we know how many reunions an alumna has attended, we can make a guess about her level of giving. If we know how much she has given, we can guess whether she’s attended more or fewer reunions than other alumni.


We are guessing an unknown value (say, giving) based on a known value (number of events attended). But note that “giving” is not really an unknown. We’ve got everyone’s giving recorded in the database. What is really unknown is an alum’s or a donor’s potential for future giving. With predictive modeling, we are making a guess at what the value of a variable will be in the (near) future, based on the current value of other variables, and the type and degree of association they have had historically.


These guesses will be far from perfect. We aren’t going to be bang-on in our guesses of children’s ages based on weight and height, and we certainly aren’t going to be very accurate with our estimates of giving based on event attendance. Even trickier, projecting into the future — estimating potential — is going to be very approximate.


Still, our guesses will be informed guesses, as long as the associations we detect are real and not due to random variation in our data. Can we predict exactly how much each donor is going to give over this coming year? No, that would be putting too much confidence in our powers. But we can expect to have plenty of success in ranking our constituents in order by how likely they are to engage in whatever behaviour we are interested in, and that knowledge will be of great value to the business.


Looking for potentially useful associations is part of data exploration, which is best done in full hands-on mode! In a future post I will talk about specific techniques for exploring different types of variables.


28 June 2015

Data mining in the archives

Filed under: Data, Predictor variables — Tags: , , , — kevinmacdonell @ 6:24 pm


When I was a student, I worked in a university archives to earn a little money. I spent many hours penciling consecutive index numbers onto acid-free paper folders, on the ultra-quiet top floor of the library. It was as dull a job as one can imagine.


Today’s post is not about that kind of archive. I’m talking about database archive views, also called snapshots. They’re useful for reporting and business intelligence, but they can also play a role in predictive modelling.


What is an archive view?


Think of a basic stat such as “number of living alumni”. This number changes constantly as new alumni join the fold and others are identified as deceased. A straightforward query will tell you how many living alumni there are, but that number will be out of date tomorrow. What if someone asks you how many living alumni you had a year ago? Then it’s necessary to take grad dates and death dates into account in order to generate an estimate. Or, you look the number up in previously-reported statistics.


A database archive view makes such reporting relatively easy by preserving the exact status of a record at regular points in time. The ideal archive is a materialized view in a data warehouse. On a given schedule (yearly, quarterly, or even monthly), an automated process adds fresh rows to an archive table that keeps getting longer and longer. You’re likely reliant on central IT services to set it up.


“Number of living alumni” is an important denominator for such key ratios as the percentage of alumni for whom you have contact information (mail, phone, email) and participation rates (the proportion of alumni who give). Every gift is entered as an individual transaction record with a specific date, which enables reporting on historical giving activity. This tends not to be true of contact information. Even though mailing addresses may be added one after another, without overwriting older addresses, the key piece of information is whether the address is coded ‘valid’ or ‘invalid’. This changes all the time, and your database may not preserve a history of those changes. Contact information records may have “To” and “From” dates associated with them, but your query will need to do a lot of relative-date calculations to determine if someone was both alive and had a valid address for any given point in time in the past.


An archive table obviates the need for this complex logic, and ours looks like the example below. There’s the unique ID of each individual, the archive date, and a series of binary indicator variables — ‘1’ for “yes, this data is present” or ‘0’ for “this data is absent”.




Here we see three individuals and how their data has changed over three months in 2015. This is sorted by ID rather than by the order in which the records were added to the archive, so that you can see the journey each person has taken in that time:


  • A00001 had no valid email in the database in February and March, but we obtained it in time for the April 1 snapshot.
  • A00002 had no contact information at all until just before March, when a phone append supplied us with a new number. The number proved to be invalid, however, and when we coded it as such in the database, the indicator reverted to zero.
  • A00003 appeared in our data in February and March, but that person was coded deceased in the database before April 1, and was excluded from the April snapshot.


That last bullet point is important. Once someone has died, continuing to include them as a row in the archive every month would be a waste of resources. In your reporting software, a simple count of records by archive date will give you the number of living alumni. A simple count of ‘Address Indicator’ will give you the number of alumni with valid addresses. Dividing the number of valid addresses by the number of living alumni (and multiplying by 100) will give you the percentage of living alumni that are addressable for that month. (Reporting software such as Tableau will make very quick work of all this.)


Because an archive view preserves changing statuses over time at the level of the individual constituent, it can be used for reporting trends along any slice you choose (age bracket, geography, school, etc.), and can play a role in staff activity/performance reporting and alumni engagement scoring.


But enough about archive views themselves. Let’s talk about using them for predictive modelling.


In the archive example above, you see a bunch of 0/1 indicator variables. Indicator variables are common in predictive modelling. For example, “Mailing address present” can have one of two states: Present or not present. It’s binary. A frequency breakdown of my data at this point in time looks like this (in Data Desk):




About 78% of living alumni have a valid address in the database today — the records with an address indicator of ‘1’. As you might expect, alumni with a good address are more likely to have given than alumni without, and they have much higher lifetime giving on average. In the models I build to predict likelihood to give (and give at higher levels), I almost always make use of this association between contact information and giving.


But what about using the archive view data instead? The ‘Address Indicator’ variable breakdown above shows me the current situation, but the archive view adds depth by going back in time. Our own archive has been taking monthly snapshots since December of last year — seven distinct points in time. Summing on “Address Indicator” for each ID shows that large numbers of alumni have either never had a valid address during that time (0 out of 7 months), or always did (7 out of 7). The rest had a change of status during the period, and therefore fall between 0 and 7:




A few hundred alumni (387) had a valid address in one out of seven months, 143 had a valid address in two out of seven — and so on. Our archive is still very young; only about 1% of alumni have a count that is not 0 or 7. A year from now, we can expect to see far more constituents populating the middle ground.


What is most interesting to me is an apparent relationship between “number of months with valid address” (x-axis) and average lifetime giving (y-axis), even with the relative scarcity of data:




My real question, of course, is whether these summed, continuous indicators really make much of a difference in a model over simply using the more familiar binary variables. The answer is “not yet — but someday.” As I noted earlier, only about 1% of living alumni have changed status in the past seven months, so even though this relationship seems linear, the numbers aren’t there to influence the strength of correlation. The Pearson correlation for “Address Indicator” (0/1) and “Lifetime Giving” is 0.186, which is identical to the Pearson correlation for “Address Count” (0 to 7) and “Lifetime Giving.” For all other variables except one, the archive counts have only very slightly higher correlations with Lifetime Giving than the straight indicator variables. (Email is slightly lower.)


It’s early days yet. All I can say is that there is potential. Have a look at this pair of regression analyses, both using Lifetime Giving (log-transformed) as the dependent variable. (Click on image for larger view.) In the window on the left, all the independent variables are the regular binary indicator variables. On the right, the independent variables are counts from our archive view. The difference in R-squared from one model to the other is very slight, but headed in the right direction: From 12.7% to 13.0%.




Looking back on my student days, I cannot deny that I enjoyed even the quiet, dull hours spent in the university archives. Fortunately, though, and due in no small part to cool data like this, my work since then has been a lot more interesting. Stay tuned for more from our archives.


11 May 2015

A new way to look at alumni web survey data

Filed under: Alumni, Surveying, Vendors — Tags: , , , , — kevinmacdonell @ 7:38 pm

Guest post by Peter B. Wylie, with John Sammis


Click to download the PDF file of this discussion paper: A New Way to Look at Survey Data


Web-based surveys of alumni are useful for all sorts of reasons. If you go to the extra trouble of doing some analysis — or push your survey vendor to supply it — you can derive useful insights that could add huge value to your investment in surveying.


This discussion paper by Peter B. Wylie and John Sammis demonstrates a few of the insights that emerge by matching up survey data with some of the plentiful data you have on alums who respond to your survey, as well as those who don’t.


Neither alumni survey vendors nor their higher education clients are doing much work in this area. But as Peter writes, “None of us in advancement can do too much of this kind of analysis.”


Download: A New Way to Look at Survey Data



Older Posts »

The Silver is the New Black Theme. Create a free website or blog at


Get every new post delivered to your Inbox.

Join 1,198 other followers