CoolData blog

10 July 2017

Analytics as an organizing principle

Filed under: Analytics, Business Intelligence — kevinmacdonell @ 7:51 am

 

I’ve been thinking a lot lately about how an organization gets good at making decisions informed by data. Or, in other words, how to build business intelligence and analytics teams. This preoccupation started with a talk I gave a couple of months ago to a gathering of Advancement leaders from across Canada. I was asked to talk about analytics in general and how our department in particular got to where we are today. Since then, I’ve also spoken to folks from other universities on the same topic.

 

All this talking has been helpful for me in organizing my thoughts, and I’ve come to realize a number of things in retrospect, ways in which we might have evolved more quickly. One of these is a realization about what it means to make data and analytics an “organizing principle.”

 

For my talk in May I was asked to begin with an overview of analytics, so I’ll devote this post to that topic. In a future post, I will share what we learned on our journey.

 

Because analytics is an ever-evolving field, I avoid dictionary-like definitions for analytics. I find it more helpful to talk about what analytics “looks like” in terms of the types of work it consists of, the skill sets of the people doing the work, and the organizational structure of the team (if it’s a team).

 

In my mind, these concepts have resolved into a “triad of threes” … The work itself fits into three tiers, the ideal analytics practitioner is a “triple threat”, and the team is made up of three distinct teams or functions. (If what I’m presenting here is an oversimplification, at least it’s a structurally satisfying one.) What I’m talking about is fairly conventional — I’m not inventing anything — but it’s supported by my own experience.

 

First, the work itself. Analytics practice today works at three distinct levels: Descriptive, predictive, and prescriptive.

 

Descriptive analytics serves the business with information, specifically information about the past, which helps us understand current performance in relation to the past. It attempts to answer the questions, “How have we done?” and “How are we doing now?” This is the realm of reporting and a lot of what is referred to as Business Intelligence. Although this is a starting point for any analytics program, that doesn’t mean it’s easy or that it doesn’t have aspects that are advanced. KPI development, support for performance management, and ad hoc data analyses to answer specific business questions might be included in this tier.

 

Predictive analytics is about predicting the future. Not “the future” in general, but the behaviour of individuals. Predictive modelling is a set of techniques for ranking individuals by their likelihood to engage in some behaviour of interest (making a bequest, becoming a donor, attending an event, etc.). The business goal might be prospect identification, or focusing limited resources to save time or money.

 

And finally, prescriptive analytics provides advice on what action to take to influence a behaviour of interest. While predictive analytics gives us an idea who’s more likely to, say, sign up for a high-end credit card from a financial institution, prescriptive analytics suggests what types of interventions (targeting advertisements, for example) that would inspire a customer to actually do it.

 

Prescriptive analytics is the newest type of analytics and the most advanced — I don’t think it’s the same as A/B testing found in direct marketing — and still rare in the nonprofit and advancement sector. I’m using an example from the financial services industry for a reason: my team is just beginning to explore this type of work, and I’m not aware of anyone else doing it. (If you’re reading this in a year or two from now, the situation might be different.)

 

If your organization is doing a good job on reporting, business intelligence, predictive modelling, and maybe some forecasting as well — then you’re most likely doing very well in comparison with your peer institutions in terms of function.

 

So much for the work. What about the people?

 

There is a popular notion of what the ideal analytics practitioner looks like in terms of education, work experience, and skills. That person, who might be styled a Data Scientist, is what I have called a “triple threat” — he or she has extensive domain expertise (fundraising, engagement, and/or marketing), a background in computer science (adept at writing scripts in SQL, R, Python or other language to extract and transform data for analysis and advanced modelling), and mathematics (with an array of advanced statistical methods in his or her toolbox).

 

The problem is, such professionals are both rare and in high demand. You won’t find many of these folks working in our sector — at least not for very long. Their natural habitat is more likely to feature Big Data, not the “little data” we’ve got, and machine learning, rather than our old standbys such as multiple linear regression. I have already elaborated on these points in the blog post I link to above, Mind the data science gap. Suffice to say, we do not currently aspire to hire data scientists.

 

That doesn’t mean the ideal isn’t a useful model, however. When we hire, it makes sense to single out candidates with skills in one of the three areas, and who seem to have some aptitude for picking up skills in complementary areas. The strategy here is not to hire a data scientist, but to grow a reasonable facsimile of one. If you’ve got an employee who has some subject-matter knowledge, has a penchant for self-learning technical skills (on her own time perhaps), is curious about things and diving into the data, and who is a good communicator — such a person will add a lot of value in a BI role.

 

You can have the right people doing the right work, but they need to work in an organizational structure that promotes data-informed decision making. So, the third and final aspect: The organizational structure. There is no one perfect structure, but keeping with the theme of “three,” I think that a three-tier setup makes sense. In a large organization, each tier might be a team. In a smaller organization, each tier might be one person. (If one person is responsible for everything, this “structure” can be thought of as a way to organize or compartmentalize one’s own work.)

 

The first and foundational tier is the Technical Team, consisting of Advancement staff who might be responsible for building and/or maintaining a data warehouse dedicated to Advancement needs, building and maintaining materialized views and data models for use in BI software, developing complex reports and dashboards, integrating internal and external systems and platforms so that data from disparate systems can be merged or federated, and liaising with central IT.

 

This tier sounds very “IT”, but it’s important to recognize that it is distinct from the institution’s centralized IT department, which is responsible for maintaining hardware, servers, and the core database software itself, as well as managing the network and security.

 

So you’re not trying to replicate an IT shop, but you are building a team with specific technical skills. For any higher ed institution in which departments are not supported equally by central IT, having in-house expertise to integrate systems and develop data models tailored to business needs is definitely a key to success. Someone has to supply and support the data infrastructure, if central IT is too overtaxed to provide.

 

The next team is the Analysis Team, the people who build predictive models, define KPIs, do ad hoc analyses, and so on. This team (or person) benefits directly from the work of the Technical Team, freed from having to always extract and transform their own data. While analysis often implies exploration of the raw, unaggregated data, there’s a huge payoff in having a lot of the standard transformations (tedious and repetitive) pushed to the data warehouse level. Analysts add the most value when they’re interacting with clients to define business questions and present results, not struggling yet again with raw, transactional data that could be processed more efficiently and accurately with an ETL tool.

 

In my own workplace, the distinction between these two teams is something of an oversimplification, but it’s roughly analogous.

 

The third team is harder to define, as it may take various forms, depending on the organization. I’ve seen it referred to as the Executive Team, but a better name might be the Analytic Strategy Team or the BI Decision Team. We don’t have a name for it in my workplace, because our department doesn’t have such a group — yet. In fact, this is less a “team” than a solid business process. In any case, I’ve come to think it’s essential for data-informed decision making, and at the heart of analytics as an organizing principle.

 

The Analytic Strategy Team would be a cross-functional team made up of business sponsors (directors and managers of programs and units) and analysts from both the Technical and Analysis teams. In a data-driven organization, this team meets regularly to rank and prioritize analysis projects that have been submitted to the team as requests, called for by department leadership, or generated by the team members themselves. Projects rank higher for being supportive of current strategy, having a high perceived impact, having executive sponsorship, and so on.

 

Prioritizing is not the team’s most important role, however. As the hub of a framework for Advancement decision-making, the Analytic Strategy Team is there to ensure that when a business question is answered through analysis, there will be follow-through. The Team nails down the “why” and “how” of every analysis project: Zeroing in on the real business question that needs to be answered, drafting the general approach to answering the question, and (most critically) determining what actions will be taken if the answer is x, y, or z. Results and recommendations are channeled to a decision maker, who has agreed in advance to the definition of the business question.

 

Ideally, the department’s leadership team approves the ongoing analytics agenda. Having leadership sign off on the list of priorities fosters an integrated approach to making decisions as a whole department.

 

This team is important for focus — analysts do their best work if they can focus — but it’s even more important for driving decisions. Your team can be kept endlessly busy generating analyses, but it’s when it comes to the consequences of analyses that BI programs risk falling flat. Without the accountability implied by an agreed-on process of question, answer, and follow-through, analysts end up floating from one fishing expedition to another, generating “findings” that never get acted on, or fulfilling requests to support program managers’ foregone conclusions with “evidence.”

 

Of course we want to do some purely exploratory analyses without a defined outcome — but that’s not how data-informed decisions get made. As Thomas Davenport has written, “In the traditional analytics world, analysts may have lacked the ability to work closely with decision-makers to frame decisions appropriately, engage stakeholders, and structure decision processes and actions. Decision analysts in a business analytics environment need to move from back-office decision support to front-office decision consultants.”

 

Again I say, these observations about the “third team” are not drawn from my first-hand experience. These are things I’ve come to understand only recently. My naiveté is evident in “Score!” the book I co-authored with Peter Wylie and which was published just two years ago. What we wrote seemed to imply that all it takes is a supportive leader driving change from the top and engaged staff people with an aptitude for data work driving change from the bottom. They would somehow meet in the middle, and magic would happen. Well, we do need both of those forces, but nowadays I don’t see organizational change happening in the absence of a well-functioning business process that guides decision-making.

 

I’ve talked about the people, the types of work they do, and the structure of the team — all from a general perspective. In my next post, I will talk about the journey our own shop has taken towards building a BI/analytics program. Not surprisingly, the real-world program doesn’t arrive as neatly packaged as this general overview would suggest.

 

23 February 2017

Proceeds from sales of “Score!” to be donated to ACLU

Filed under: Book, Peter Wylie, Score! — Tags: , — kevinmacdonell @ 9:09 pm

 

Peter Wylie and I are pleased to tell you that all our current and future royalties from sales of the book “Score!: Data-Driven Success for Your Advancement Team” will be donated to the American Civil Liberties Union.

 

A good seller since it came out a couple of years ago, “Score!” is available for order online, in both print and e-book versions. (Click here to enter the CASE book store.)

 

Each year around late August, I am delighted to see that cheque in my mail from the Council for Advancement and Support of Education. (Peter of course gets his cheque at the same time, only he spells it “check”.) The next cheque (or check) we receive will be our third. We never know how sales have gone for the year until we get paid; since “Score!” continues to be featured prominently in the CASE catalogue, and people continue to click through this blog to the CASE bookstore every day, we have reason to think sales are still healthy.

 

A good opportunity, then, to extend our little book’s modest influence in a positive direction in these strange times. The ACLU works to defend and preserve the individual rights and liberties guaranteed by the Constitution and laws of the United States. As you may know, I live in Canada, but I recognize that holding the current administration to account is in everyone’s interest.

 

If you’ve been meaning to get a copy and just needed that extra reason to act, click here order online. Or, even better, consider making a contribution directly to the ACLU or whatever organization you feel is best positioned to undo the poison of xenophobia in your community, region, or country.

 

31 January 2017

Are we missing too many alumni with web surveys? (Part 2)

Filed under: Alumni, John Sammis, Peter Wylie, Surveying — Tags: , — kevinmacdonell @ 6:22 am

Guest post by Peter B. Wylie, with John Sammis

 

Download a printable PDF version of this paper: Are We Missing Too Many Alumni P2.

 

It seems everyone we know, no matter how young or old, has an email address or uses Facebook. So we might assume that nowadays online surveys will reliably deliver a representative sampling of a school’s alumni population.

 
 

In this guest post, Peter Wylie and John Sammis demonstrate that alumni available and willing to be polled online differ from non-online constituents in potentially significant ways. Although current practice tends towards online-only surveying, the evidence suggests this probably skews the conclusions we can draw about our constituencies, with key differences that go well beyond just age.

 
 

(This is “part 2” of an earlier piece. To download the first paper, click here: Are We Missing Too Many Alumni With Web Surveys?)

 
 

Again, the link for Part 2:  Are We Missing Too Many Alumni P2.

 
 

5 December 2016

Amazing things with matching strings

Filed under: Coolness, Data integrity, SQL — Tags: , , , — kevinmacdonell @ 7:44 am

 

I had an occasion recently when it would have been really helpful to know that a new address added to the database was a duplicate of an older, inactivated address. The addition wasn’t identified as a duplicate because it wasn’t a perfect match — a difference similar to that between 13 Anywhere Road and 13 Anywhere Drive. 

 

After the fact, I did a Google search and discovered some easy-to-use functionality in Oracle SQL that might have saved us some trouble. Today I want to talk about how to use UTL_MATCH and  suggest some cool applications for it in Advancement data work.

 

“Fuzzy matching” is the term used for identifying pairs of character strings that may not be exactly the same, but are so close that they could be. For example, “Washignton” is one small typo away from “Washington,” but the equivalence is very difficult to detect by any means other than an alert pair of human eyes scanning a sorted list. When the variation occurs at the beginning of a string — “Unit 3, 13 Elm St.” instead of “Apmt 3, 13 Elm St.” — then even a sorted list is of no use.

 

According to this page, the UTL_MATCH package was introduced in Oracle 10g Release 2, but first documented and supported in Oracle 11g Release 2. The package includes two functions for testing the level of similarity or difference between strings.

 

The first function is called EDIT_DISTANCE, which is a count of the number of “edits” to get from one string to a second string. For example, the edit distance from “Kevin” to “Kelvin” is 1, for “New York” to “new york” is 2, and from “Hello” to “Hello” is 0. (A related function, EDIT_DISTANCE_SIMILARITY, expresses the distance as a normalized value between 0 and 100 — 100 being a perfect match.)

 

The second method, the one I’ve been experimenting with, is called JARO_WINKLER, named for an algorithm that measures the degree of similarity between two strings. The result ranges between 0 (no similarity) to 1 (perfect similarity). It was designed specifically for detecting duplicate records, and its formula seems aimed at the kind of character transpositions you’d expect to encounter in data entry errors. (More info here: Jaro-Winkler distance.)

 

Like EDIT_DISTANCE, it has a related function called JARO_WINKLER_SIMILARITY. Again, this ranges from 0 (no match) to 100 (perfect match). This is the function I will refer to for the rest of this post.

 

Here is a simple example of UTL_MATCH in action. The following SQL scores constituents in your database according to how similar their first name is to their last name, with the results sorted in descending order by degree of similarity. (Obviously, you’ll need to replace “schema”, “persons,” and field names with the proper references from your own database.)

 

SELECT

t1.ID,

t1.first_name,

t1.last_name,

UTL_MATCH.jaro_winkler_similarity(t1.first_name, t1.last_name) AS jw

FROM schema.persons t1

ORDER BY jw DESC

 

Someone named “Donald MacDonald” would get a fairly high value for JW, while “Kevin MacDonell” would score much lower. “Thomas Thomas” would score a perfect 100.

 

Let’s turn to a more useful case: Finding potential duplicate persons in your database. This entails comparing a person’s full name with the full name of everyone else in the database. To do that, you’ll need a self-join.

 

In the example below, I join the “persons” table to itself. I concatenate first_name and last_name to make a single string for the purpose of matching. In the join conditions, I exclude records that have the same ID, and select records that are a close or perfect match (according to Jaro-Winkler). To do this, I set the match level at some arbitrary high level, in this case greater than or equal to 98.

 

SELECT

t1.ID,

t1.first_name,

t1.last_name,

t2.ID,

t2.first_name,

t2.last_name,

UTL_MATCH.jaro_winkler_similarity ( t1.first_name || ' ' || t1.last_name, t2.first_name || ' ' || t2.last_name ) AS jw

FROM schema.persons t1

INNER JOIN schema.persons t2 ON t1.ID != t2.ID AND UTL_MATCH.jaro_winkler_similarity ( t1.first_name || ' ' || t1.last_name, t2.first_name || ' ' || t2.last_name ) >= 98

ORDER BY jw DESC

 

I would suggest reading this entire post before trying to implement the example above! UTL_MATCH presents some practical issues which limit what you can do. But before I share the bad news, here are some exciting possible Advancement-related applications:

 

  • Detecting duplicate records via address matching.
  • Matching external name lists against your database. (Which would require the external data be loaded into a temporary table in your data warehouse, perhaps.)
  • Screening current and incoming students against prospect, donor, and alumni records for likely matches (on address primarily, then perhaps also last name).
  • Data integrity audits. An example: If the postal code or ZIP is the same, but the city name is similar (but not perfectly similar), then there may be an error in the spelling or capitalization of the city name.
  • Searches on a particular name. If the user isn’t sure about spelling, this might be one way to get suggestions back that are similar to the guessed spelling.

 

Now back to reality … When you run the two code examples above, you will probably find that the first executes relatively quickly, while the second takes a very long time or fails to execute at all. That is due to the fact that you’re evaluating each record in the database against every other record. This is what’s known as a cross-join or Cartesian product — a very costly join which is rarely used. If you try to search for matches across 100,000 records, that’s 10 billion evaluations! The length of the strings themselves contributes to the complexity, and therefore the runtime, of each evaluation — but the real issue is the 10,000,000,000 operations.

 

As intriguing as UTL_MATCH is, then, its usage will cause performance issues. I am still in the early days of playing with this, but here are a few things I’ve learned about avoiding problems while using UTL_MATCH.

 

Limit matching records. Trying to compare the entire database with itself is going to get you in trouble. Limit the number of records retrieved for comparison. A query searching for duplicates might focus solely on the records that have been added or modified in the past day or two, for example. Even so, those few records have to be checked against all existing records, so it’s still a big job — consider not checking against records that are marked deceased, that are non-person entities, and so on. Anything to cut down on the number of evaluations the database has to perform.

 

Keep strings short. Matching works best when working with short strings. Give some thought to what you really want to match on. When comparing address records, it might make sense to limit the comparison to Street Line 1 only, not an entire address string which could be quite lengthy.

 

Pre-screen for perfect matches: A Jaro-Winkler similarity of 100 means that two strings are exactly equal. I haven’t tested this, but I’m guessing that checking for A = B is a lot faster than calculating the JW similarity between A and B. It might make sense to have one query to audit for perfect matches (without the use of UTL_MATCH) and exclude those records from a second query that audits for JW similarities that are high but less than a perfect 100.

 

Pre-screen for impossible matches. If a given ID_1 has a street address than is 60 characters long and a given ID_2 has a street address that is only 20 characters long, there is no possibility of a high Jaro-Winkler score and therefore no need to calculate it. Find a way to limit the data set to match before invoking UTL_MATCH, possibly through the use of a WITH clause that limits potential matching pairs by excluding any that differ in length by more than, say, five characters. (Another “pre-match” to use would check if the initial letter in a name is the same; if it isn’t, good chance it isn’t going to be a match.)

 

Keep match queries simple. Don’t ask for fields other than ID and the fields you’re trying to match on. Yes, it does make sense to bring down birthdate and additional address information so that the user can decide if a probable match is a true duplicate or not, but keep that part of the query separate from the match itself. You can do this by putting the match in a WITH clause, and then left-joining additional data to the results of that clause.

 

Truth be told, I have not yet written a query that does something useful while still executing in a reasonable amount of time, simply due to the sheer number of comparisons being made. I haven’t given up on SQL, but it could be that duplicate detection is better accomplished via a purpose-built script running on a standalone computer that is not making demands on an overburdened database or warehouse (aside from the initial pull of raw data for analysis).

 

The best I’ve done so far is a query that selects address records that were recently modified and matches them against other records in the database. Before it applies Jaro-Winkler, the query severely limits the data by pairing up IDs that have name strings and address strings that are nearly the same number of characters long. The query has generated a few records to investigate and, if necessary, de-dupe — but it takes more than an hour and half to run.

 

Have any additional tips for making use of UTL_MATCH? I’d love to hear and share. Email me at kevin.macdonell@gmail.com.

 

13 November 2016

Where we go from here

Filed under: Off on a tangent — Tags: , — kevinmacdonell @ 6:17 pm

 

Disbelief, anger, helplessness, anxiety. Does that describe your week just past? It certainly describes mine.

 

Given the nature of this blog, you might expect me to be dismayed at how poorly the number-crunchers fared in forecasting the outcome of this presidential election. But no, I don’t care about that.

 

While Tuesday night’s events were still unfolding on television, and long before any protestors took to the streets, voices of reason were already reminding us not to despair. I held onto three examples of these calm voices, because I figured I would need them. I would like to share them with you.

 

The first came around midnight, when it was starting to dawn on me that things were going to end badly:

 

“When voices of intolerance are loudest don’t be despondent — be emboldened, and even more committed to values of diversity and inclusion.”

 

That was a tweet from Richard Florizone (@DalPres), president of Dalhousie University, where I work. His words seemed too oblique when I first read them, somehow falling short of the righteous outrage called for by the occasion. But with the distance of a few days, when my head was cooler, I appreciated that this message was just right.

 

The second helpful piece of advice was a quote by French philosopher and political activist Simone Weil (1909-1943):

 

“Never react to an evil in such a way as to augment it.”

 

Such a succinct antidote to our instinct for knee-jerk retaliation! This quote came to me from the perennially wonderful Maria Popova (@brainpicker), a Bulgarian writer, blogger, and critic living in Brooklyn, New York. Her blog, BrainPickings.org, features her writing on culture, books, and eclectic subjects.

 

And finally, a simply-worded tweet from fundraising professional Lindsay Brown (@DonorScience) in Boston completed this circle of advice with a call to action:

 

“Now more than ever, it’s apparent to me that the work we do in the nonprofit sector is massively important. Let’s keep up the good work.”

 

This is only a sampling of the many calm and wise words spoken in recent days, but they will suffice. What do these three sentiments, taken together, advise us to do?

 

First, we are reminded that the Trump victory has not nullified the values of diversity and inclusion, nor impeded our ability to promote them. We need to understand why he was elected, and by whom (including millions of former Obama supporters who failed to vote), and to address root causes of political extremism. We need to understand, not denigrate, in order to clarify what we need to do to.

 

Second, whatever we do we should avoid making problems worse. Don’t move to Canada! As much as I’d love to have you here (in the unlikely event that Canada enables such immigration), please know that your country needs you now more than ever. For those outside the U.S. who feel like disengaging from that country via a boycott (which was my own initial response), please reflect on the consequences of feeding isolationism. And rioting in the streets against the outcome of a free and fair election can have no legitimate result. During the campaign, President Obama repeated the refrain, “Don’t boo — Vote!” Today we can say, “Don’t boo — Act!”

 

Third and finally: Never doubt that our sector is a vital player in creating a better world, despite not being directly “political”. Higher education and a host of nonprofits can build up and defend what Trumpism wants to tear down, and can help create diverse societies to combat the irrational fear of the Other that helps elect leaders like Trump in the first place.

 

The bad news is perfectly clear: that a radicalized faction of white extremism has just elected a dangerous, unpredictable leader animated by ethnic nationalism and xenophobia; that a nation that could have made history by electing its first woman president instead chose a man who abused and denigrated women and boasted about it; that a nostalgia for a bygone decade before civil rights has accompanied an irrational belief that advancement of ethnic minorities threatens the white, working-class status quo; that a country with international commitments to fight climate change has just elected a leader who doesn’t even believe climate change is a real thing.

 

This sudden clarity — this stunning proof that we have not made nearly as much progress as we thought — should be strong motivation not to despair but to get right to work.

 

I don’t have a prescription for what anyone needs to do. It depends on where you are, what tools you have to work with.

 

Do we have work to do at home? I’m willing to bet your daughters are prepared to take on a sexist world, but what are you telling your sons in order that they will help to create a new world?

 

What can we do in our neighbourhoods? Can diverse communities be brought together to interact? Can we replace mere proximity to the Other, which leads to tension and irrational suspicion, with familiarity and interdependence?

 

What causes and projects can we support with our dollars, our time, and our expertise to increase the ability for marginalized people to participate in the economy, to protect the environment, to support reputable journalism, to extend access to education, to promote people’s rights, to fight cynicism about politics and government?

 

There is so much — no one can do it all. I am still thinking about my own “what now?” list, and I know I have to choose wisely. But like voting itself, it is the accumulation of millions of individual actions that lead to dramatic overall results. Let’s agree that it is no longer enough to hold certain opinions, no longer enough to share the right memes on Facebook, no longer enough even to believe that our duty stops with voting and paying taxes.

 

As Hillary Clinton said the day after the election, “… our Constitutional democracy demands our participation. Not just every four years, but all the time. So let’s do all we can to keep advancing the causes and values we all hold dear. Making our economy work for everyone — not just those at the top. Protecting our country and protecting our planet. And breaking down all the barriers that hold any American back from achieving their dreams.”

 

These words can apply just as well to citizens of the United Kingdom, where far-right xenophobia prevailed in the Brexit vote, and to citizens of Canada, where extremist politicians are already talking about emulating Trump, and to people anywhere else in the world who are free to speak and act.

 

Disbelief, anger, helplessness, anxiety. Yes, there’s a time for all of those things. But let’s not subside into resignation, division, hopelessness, and cynicism. Instead let’s each of us look at our immediate surroundings and figure out what we can do. And then, roll up our sleeves and get to work.

 

3 October 2016

Grad class size: predictive of giving, but a reality check, too

 

The idea came up in a conversation recently: Certain decades, it seems, produced graduates that have reduced levels of alumni engagement and lower participation rates in the Annual Fund. Can we hope they will start giving when they get older, like alumni who have gone before? Or is this depressed engagement a product of their student experience — a more or less permanent condition that will keep them from ever volunteering or giving?

 

The answer is not perfectly clear, but what I have found with a bit of analysis can only add to the concern we all have about the end of “business as usual.”

 

For almost all universities, enrolments have risen dramatically over the decades since the end of the second World War. As undergraduate class sizes ballooned, metrics such as the student-professor ratio emerged as important indicators of quality of education. It occurred to me to calculate the size of each grad-year cohort and include it as a variable in predictive models. For a student who graduated in 1930, that figure could be 500. For someone who graduated in 1995, it might be 3,000. (If you do this, remember not to exclude now-deceased alumni in your count.) A rough generalization about the conditions under which a person received their degree, to be sure, but it was easy to query the database for this, and easy to test.

 

I pulled lifetime giving for 130,000 living alumni and log-transformed it before checking for a correlation with the size of graduating class. (The transformation being log of “lifetime giving plus 1.”) It turned out that lifetime giving has a strong inverse correlation with the size of an alum’s grad class, for that alum’s most recent degree. (r = -0.338)

 

This is not surprising. The larger the graduating class, the younger the alum. Nothing is as strongly correlated with lifetime giving as age, therefore much of the effect I was seeing was probably due to age. (The Pearson correlation of LTG and age was 0.395.)

 

Indeed, in a multiple linear regression of age on lifetime giving (log-transformed), adding “grad-class size” as a predictor variable does not improve model fit. The two predictors are not independent of each other: For age and grad-class size, r = -0.828!

 

I wasn’t ready to give up on the idea, though. I considered my own graduation from university, and all the convocations I had attended in the past as an Advancement employee or a family member of a graduate. The room (or arena, as the case may be) was full of grads from a whole host of degree programs, most of whom had never met each other or attended any class in common. Enrolment growth has been far from even across faculties (or colleges or schools); the student experience in terms of class size and one-on-one access to professors probably differs greatly from program to program. At most universities, Arts or Science faculties have exploded in size, while Medicine or Law have probably not.

 

With that in mind, I calculated grad-class size differently, counting the size of each alum’s graduating cohort at the faculty (college) level. The correlation of this more granular count of grads with lifetime giving was not as negative (r = -0.283), but at the same time, it was less tied to age.

 

This time, when I created a regression of age on lifetime giving and then added grad-class size at the faculty level, both predictors were significant. Grad class size gave a good boost to adjusted R squared.

 

I seemed to be on to something, so I pushed it farther. Knowing that an undergrad’s experience is very different from that of a graduate student, I added “Number of Degrees” as a variable after age, and before grad-class size. All three predictors were significant and all led to improvements in model fit.

 

Still on the trail of how class size might affect student experience, and alumni affinity and giving thereafter, I got more specific in my query, counting the number of graduates in each alum’s year of graduation and degree program. This variable was even less conflated with age, but despite that, it failed to provide any additional explanation for the variation in lifetime giving. There may be other forms of counts that are more predictive, but the best I found was size of grad class at the faculty/college level.

 

If I were asked to speculate about the underlying cause, the narrative I’d come up with is that enrolments grew dramatically not only because there were more young people, but because universities in North America were attracting students who increasingly felt that a university degree was a rite of passage required for success in the job market. The relationship of student to university was changing, from that of a close-knit club of scholars, many of whom felt immensely grateful for the opportunity, to a much larger, less cohesive population with a more transactional view of their relationship with alma mater.

 

That attitude (“I paid x dollars for my piece of paper and so our business here is done”), and not so much the increasing numbers of students they shared the lecture halls with, could account for drops in philanthropic support. What that means for Annual Fund is that we can’t bank on the likelihood that a majority of alumni will become nostalgic when they reach the magic age of 50 or 60 and open their wallets as a consequence. Everything’s different now.

 

I don’t imagine this is news to anyone who’s been paying attention. But it’s interesting to see how this reality is reflected in the data. And it’s in the data that we will be able to find the alumni for whom university was not just a transaction. Our task today is not just to identify that valuable minority, but to understand them, communicate with them intelligently, connect with their interests and passions, and engage them in meaningful interactions with the institution.

 

Older Posts »

Blog at WordPress.com.