CoolData blog

20 January 2018

Download my free handbook on predictive modeling

CoolDataBook

I like to keep things simple, so here’s the gist: I wrote another book. It’s free. Download it here.

 

The title says it all: “Cool Data: A how-to guide for predictive modeling for higher education advancement and nonprofits using multiple linear regression in Data Desk.” It’s a 190-page “cookbook,” a guide for folks who aren’t looking for deep understanding of stats, regression, or even predictive modelling, but just enough knowledge — a recipe, really — to mine the value in their organizations’ databases. It’s the kind of book I would have loved to have when I was starting out.

 

Take a look, dive in if it’s your thing, share it with someone who might be interested.

 

I remember talking about the idea as long ago as 2010. I wanted to write something not too technical, yet valid, practical, and actionable. On getting into it I quickly realized that I couldn’t talk about multiple linear regression without talking about how to clean, transform, and prepare data for modelling. And I couldn’t talk about data prep without talking about querying a database. As a result, a large portion of the book is an introduction to SQL; again, not a deep dive into writing queries, but just enough for a motivated person to learn how to build an analysis-ready file.

 

I don’t have to sell you on it, though, because it’s free — download it and do whatever you want with it. If it looks interesting to you, buy the Data Desk software and work through the book using the sample data and your own data. (Be sure to check back for updates to the book which may be necessary as the Data Desk software continues to evolve.) And, of course, consider getting training, preferably one-on-one.

 

Unlike this handbook, Data Desk and training are not free, but they’re investments that will pay themselves back countless times over — if you stick with it.

 

 

Advertisement

2 August 2016

Data Down Under, and the real reason we measure alumni engagement

Filed under: Alumni, Dalhousie University, engagement, Training / Professional Development — Tags: — kevinmacdonell @ 4:00 pm

 

coverI’ve given presentations here and there around Canada and the U.S., but I’ve never travelled THIS far. On Aug. 24, I will present a workshop in Sydney, Australia — a one-day master class for CASE Asia-Pacific on using data to measuring alumni engagement. My wife and I will be taking some time to see some of that beautiful country, leaving in just a few days.

 

The workshop attendees will be alumni relations professionals from institutions large and small, and in the interest of keeping the audience’s needs in mind, I hope to convince them that measuring engagement is worth doing by talking about what’s in it for them.

 

This will be the easy part. Figuring out how to quantify engagement will allow them to demonstrate the value of their teams’ activity to the university, using language their senior leadership understands. Scoring can also help alumni teams better target segments based on varying levels of engagement, evaluate current alumni programming, and focus on activities that yield the greatest boost in engagement.

 

There is a related but larger context for this discussion, however. I am not certain that everyone will be keen to hear about it.

 

Here’s the situation. Everything in alumni relations is changing. Alumni populations are growing, the number of donors is decreasing, and traditional engagement methods are less effective. Friend-raising and “one size fits all” approaches to engagement are increasingly seen as unsustainable wastes of resources. (A Washington, DC based consultancy, the Education Advisory Board, makes this point very well in this excerpt of a report which you can download here: The Strategic Alumni Relations Enterprise.)

 

I don’t know so much about the Asia-Pacific region, but in North America university leaders are questioning the very purpose and value of typical alumni relations activities. In this scenario, engagement measurement is intended for more than producing a merely informational report or having something to brag about: Engagement measurement is really a tool that enables alumni relations to better align itself with the Advancement mission.

 

In place of “one size fits all,” alumni relations teams are under pressure to understand how to interact with alumni at different levels of engagement. Alumni who are somewhat engaged should be targeted with relevant programs and messages to bring them to the next level, while alumni who are at the lowest levels of engagement should not have significant resources directed at them.

 

Alumni at high levels of engagement, however, require special and customized treatment. They’re looking for deeper and more fulfilling experiences that involve furthering the mission of the institution itself. Think of guest lecturing, student recruitment, advisory board roles, and mentorship, career development and networking for students and new grads. Low-impact activities such as pub nights and other social events are a waste of the potential of this group and will fail to move them to continue contributing their time and money.

 

Think of what providing these quality experiences will entail. For one, alumni relations staff will have to collaborate with their colleagues in development, as well as in other offices across campus — enrolment management, career services, and academic offices. This will be a new thing, and perhaps not an easy thing, for alumni relations teams stuck in traditional friend-raising mode and working in isolation.

 

But it’s exactly through these strategic partnerships that alumni relations can prove its value to the whole institution and attract additional resources even in an environment where leaders are demanding to know the ROI of everything.

 

Along with better integration, a key element of this evolution will be robust engagement scoring. According to research conducted by the Education Advisory Board, alumni relations does the poorest job of any office on campus in providing hard data on its real contribution to the university’s mission. Too many of us are still stuck on tracking our activities instead of the results of those activities.

 

It doesn’t have to be that way, if the alumni team can effectively partner with other units in Advancement. For those of us on the data, reporting, and analysis side of the house, get ready: The alumni team is coming.

 

3 January 2016

CoolData (the book) beta testers needed

 

UPDATE (Jan 5): 16 people have responded to my call for volunteers, so I am going to close this off now. I have been in touch with each person who has emailed me, and I will be making a final selection within a few days. Thank you to everyone who considered taking a crack at it.

 

Interested in being a guinea pig for my new handbook on predictive modelling? I’m looking for someone (two or three people, max) to read and work through the draft of “CoolData” (the book), to help me make it better.

 

What’s it about? This long subtitle says it all: “A how-to guide for predictive modelling for higher education advancement and nonprofits using multiple linear regression in Data Desk.”

 

The ideal beta tester is someone who:

 

  • has read or heard about predictive modelling and understands what it’s for, but has never done it and is keen to learn. (Statistical concepts are introduced only when and if they are needed – no prior stats knowledge is required. I’m looking for beginners, but beginners who aren’t afraid of a challenge.);
  • tends to learn independently, particularly using books and manuals to work through examples, either in addition to training or completely on one’s own;
  • does not have an IT background but has some IT support at his or her organization, and would not be afraid to learn a little SQL in order to query a database him- or herself, and
  • has a copy of Data Desk, or intends to purchase Data Desk. (Available for PC or Mac).

 

It’s not terribly important that you work in the higher ed or nonprofit world — any type of data will do — but the book is strictly about multiple linear regression and the stats software Data Desk. The methods outlined in the book can be extended to any software package (multiple linear regression is the same everywhere), but because the prescribed steps refer specifically to Data Desk, I need someone to actually go through the motions in that specific package.

 

Think of a cookbook full of recipes, and how each must be tested in real kitchens before the book can go to press. Are all the needed ingredients listed? Has the method been clearly described? Are there steps that don’t make sense? I want to know where a reader is likely to get lost so that I can fix those sections. In other words, this is about more than just zapping typos.

 

I might be asking a lot. You or your organization will be expected to invest some money (for the software, sales of which I do not benefit from, by the way) and your time (in working through some 200 pages).

 

As a return on your investment, however, you should expect to learn how to build a predictive model. You will receive a printed copy of the current draft (electronic versions are not available yet), along with a sample data file to work through the exercises. You will also receive a free copy of the final published version, with an acknowledgement of your work.

 

One unusual aspect of the book is that a large chunk of it is devoted to learning how to extract data from a database (using SQL), as well as cleaning it and preparing the data for analysis. This is in recognition of the fact that data preparation accounts for the majority of time spent on any analysis project. It is not mandatory that you learn to write queries in SQL yourself, but simply knowing which aspects of data preparation can be dealt with at the database query level can speed your work considerably. I’ve tried to keep the sections about data extraction as non-technical as possible, and augmented with clear examples.

 

For a sense of the flavour of the book, I suggest you read these excerpts carefully: Exploring associations between variables and Testing associations between two categorical variables.

 

Contact me at kevin.macdonell@gmail.com and tell me why you’re interested in taking part.

 

 

 

1 April 2015

Mind the data science gap

Filed under: Training / Professional Development — Tags: , , — kevinmacdonell @ 8:10 pm

 

Being a forward-thinking lot, the data-obsessed among us are always pondering the best next step to take in professional development. There are more options every day, from a Data Science track on Coursera to new masters degree programs in predictive analytics. I hear a lot of talk about acquiring skills in R, machine learning, and advanced modelling techniques.

 

All to the good, in general. What university or large non-profit wouldn’t benefit from having a highly-trained, triple-threat chameleon with statistics, programming, and data analytics skills? I think it’s great that people are investing serious time and brain cells pursuing their passion for data analysis.

 

And yet, one has to wonder, are these advanced courses and tools helping drive bottom-line results across the sector? Are they helping people at nonprofits and university advancement offices do a better job of analyzing their data toward some useful end?

 

I have a few doubts. The institutions and causes that employ these enterprising learners may be fortunate to have them, but I would worry about retention. Wouldn’t these rock stars eventually feel constrained in the nonprofit or higher ed world? It’s a great place to apply one’s creativity, but aren’t the problems and applications one can address with data in our field relatively straightforward in comparison with other fields? (Tailoring medical treatment to an individual’s DNA, preventing terrorism or bank fraud, getting an American president elected?) And then there’s the pay.

 

Maybe I’m wrong to think so. Clearly there are talented people working in our sector who are here because they have found the perfect combination of passions. They want to be here.

 

Anyway — rock star retention is not my biggest concern.

 

I’m more concerned about the rest of us: people who want to make better use of data, but aren’t planning to learn way more than we need or are capable of. I’m concerned for a couple of reasons.

 

First, many of the professional development options available are pitched at a level too advanced to be practical for organizations who haven’t hired a full-time predictive analytics specialist. The majority of professionals working in the non-profit and higher-ed sectors are mainly interested in getting better at their jobs, whether that’s increasing dollars raised or boosting engagement among their communities. They don’t need to learn to code. They do need some basic, solid training options. I’m not sure these are easy to spot among all the competing offerings and (let’s be honest) the Big Data hype.

 

These people need support and appropriate training. There’s a place for scripting and machine learning, but let’s ensure we are already up to speed on means/medians, bar charts, basic scoring, correlation, and regression. Sexy? No. But useful, powerful, necessary. Relatively simple and manual techniques that are accessible to a range of advancement professionals — not just the highly technical — offer a high return on investment. It would be a shame if the majority were cowed into thinking that data analysis isn’t for them just because they don’t see what neural networks have to do with their day to day work.

 

My second concern is that some of the advanced tools of data science are deceptively easy to use. I read an article recently that stated that when it’s done really well, data science looks easy. That’s a problem. A machine-learning algorithm will spit out answers, but are they worth anything? (Maybe.) Does an analyst learn anything about their data by tweaking the knobs on a black box? (Probably not.) Is skipping over the inconvenience of manual data exploration detrimental to gaining valuable insights? (Yes!)

 

Don’t get me wrong — I think R, Python, and other tools are extremely useful for predictive modelling, although not for doing the modelling itself (not in my hands, at least). I use SQL and Python to automate the assembly of large data files to feed into Data Desk — it’s so nice to push a button and have the script merge together data from the database, from our phonathon database, from our broadcast email platform and other sources, as well as automatically create certain indicator variables, pivoting all kinds of categorical variables and handling missing data elegantly. Preparing this file using more manual methods would take days.

 

But this doesn’t automate exploration of the data, it doesn’t remove the need to be careful about preparing data to answer the business question, and it does absolutely nothing to help define that business question. Rather than let a script grind unsupervised through the data to spit out a result seconds later without any subject-matter expertise being applied, the real work of building a model is still done manually, in Data Desk, and right now I doubt there is a better way.

 

When it comes to professional development, then, all I can say is, “to each their own.” There is no one best route. The important thing is to ensure that motivated professionals are matched to training that is a good fit with their aptitudes and with the real needs of the organization.

 

18 January 2015

Why blog? Six reasons and six cautions

Filed under: CoolData, Off on a tangent, Training / Professional Development — Tags: , — kevinmacdonell @ 4:12 pm

THE two work-related but extracurricular activities I have found the most rewarding, personally and professionally, are giving conference presentations and writing for CoolData. I’ve already written about the benefits of presenting at conferences, explaining why the pain is totally worth it. Today: six reasons why you might want to try blogging, followed by six (optional) pieces of advice.

I’ve been blogging for just over five years, and I can say that the best way to start, and stay started, is to seek out motives that are selfish. The type of motivation I’m thinking of is intrinsic, such as personal satisfaction, as opposed to extrinsic, such as aiming to have a ton of followers and making money. It’s a good selfish.

Three early reasons for getting started with a blog are:

1. Documenting your work: One of my initial reasons for starting was to have a place to keep snippets of knowledge in some searchable place. Specific techniques for manipulating data in Excel, for example. I have found myself referring to older published pieces to remind me how I carried out an analysis or when I need a block of SQL. A blog has the added benefit of being shareable, but if your purpose is personal documentation, it doesn’t matter if you have any audience at all.

2. Developing your thoughts: Few activities bring focus and clarity to your thoughts like writing about them. Some of my ideas on more abstract issues have been shaped and developed this way. Sometimes the office is not the best environment for this sort of reflective work. A blog can be a space for clarity. Again — no need for an audience.

3. Solidifying your learning: One of the best ways to learn something new is by teaching it to someone else. I may have had an uncertain grasp of multiple linear regression, for example, when I launched CoolData, but the exercise of trying to explain data mining concepts and techniques was a great way to get it all straight in my head. If I were to go back today and re-read some of my early posts on the subject, which I rarely do, I would find things I probably would disagree with. But the likelihood of being wrong is not a good enough reason to avoid putting your thoughts out there. Being naive and wrong about things is a stage of learning.

Let’s say that, motivated by these or other reasons, you’ve published a few posts. Suddenly you’ve got something to share with the world. Data analysis lends itself perfectly to discussion via blogs. Not only analysts and data miners, but programmers, prospect researchers, business analysts, and just about anyone engaged in knowledge work can benefit personally while enriching their profession by sharing their thoughts with their peers online.

As you slowly begin to pick up readers, new reasons for blogging will emerge. Three more reasons for blogging are:

4. Making professional connections: As a direct result of writing the blog I have met all kinds of interesting people in the university advancement, non-profit, and data analysis worlds. Many I’ve met only virtually, others I’ve been fortunate to meet in person. It wasn’t very long after I started blogging that people would approach me at conferences to say they had seen one of my posts. Some of them learned a bit from me, or more likely I learned from them. A few have even found time to contribute a guest post.

5. Sharing knowledge: This is the obvious one, so no need to say much more. Many advancement professionals share online already, via various listservs and discussion forums. The fact this sharing goes on all the time makes me wonder why more people don’t try to make their contributions go even farther by taking the extra step of developing them into blog posts that can be referred to anytime.

6. Building toward larger projects: If you keep at it, slowly but surely you will build up a considerable body of work. Blogging can feed into conference presentations, discussion papers, published articles, even a book.

Let me return to the distinction I made earlier between intrinsic and extrinsic motivators — the internal, more personal rewards of blogging versus the external, often monetary, goals some people have. As it happens, the personal reasons for blogging are realistic, with a high probability of success, while the loftier goals are likely to lead to premature disillusionment. A new blog with no audience is a fragile thing; best not burden it with goals you cannot hope to realize in the first few years.

I consider CoolData a success, but not by any external measure. I simply don’t know how many followers a blog about data analysis for higher education advancement ought to have, and I don’t worry about it. I don’t have goals for number of visitors or subscribers, or even number of books sold. (Get your copy of “Score!” here. … OK — couldn’t resist.)

The blog does what I want it to do.

That’s mostly what I have to say, really. I have a few bits of advice, but my strongest advice is to ignore what everybody else thinks you should do, including me. Most expert opinion on posting frequency, optimum length for posts, ideal days and times for publishing, click-bait headlines, search engine optimization and the like is a lot of hot air.

If you’re still with me, here are a few cautions and pieces of advice, take it or leave it:

1. On covering your butt: Some employers take a dim view of their employees publishing blogs and discussing work-related issues on social media. You might want to clear your activity with your supervisor first. When I changed jobs, I disclosed that I intended to keep up my blog. I explained that connecting with counterparts at other universities was a big part of my professional development. There’s never been an issue. Be clear that you’re writing for a small readership of professionals who share your interests, an activity not unlike giving a conference presentation. Any enlightened organization should embrace someone who takes the initiative. (You could blog secretly and anonymously, but what’s the point?)

2. On “permission”: Beyond ensuring that you are not jeopardizing your day job, you do not require anyone’s permission. You don’t have to be an expert; you simply have to be interested in your subject and enthusiastic about sharing your new knowledge with others. Beginners have an advantage over experts when it comes to blogging; an expert will often struggle to relate to beginners, and assume too much about what they know or don’t know. So what if that post from two years ago embarrasses you now? You can always just delete it. If you’re reticent about speaking up, remember that blogging is not about claiming to be an authority on anything. It’s about exploring and sharing. It’s about promoting helpful ideas and approaches. You can’t prevent small minds from interpreting your activity as self-promotion, so just keep writing. In the long run, it’s the people who never take the risk of putting themselves out there who pay the higher price.

3. On writing: The interwebs ooze with advice for writers so I won’t add to the noise. I’ll just say that, although writing well can help, you don’t need to be an exceptional stylist. I read a lot of informative yet sub-par prose every day. The misspellings, mangled English, and infelicities that would be show-stoppers if I were reading a novel just aren’t that important when I’m reading for information that will help me do my job.

4. On email: In the early days of email I thought it rude not to respond. Today things are different: It’s just too easy to bombard people. Don’t get me wrong: I have received many interesting questions from readers (some of which have led to new posts, which I love), as well as great opportunities to connect, participate in projects, and so on. But just because you make yourself available for interaction doesn’t mean you need to answer every email. You can lay out the ground rules on an “About” page. If someone can’t be bothered to consider your guidelines for contact, then an exchange with that person is not going to be worth the trouble. On my “About this Blog” page I make it clear that I don’t review books or software, yet the emails offering me free stuff for review keep coming. I have no problem deleting those emails unanswered. … Then there are emails that I fully intend to respond to, but don’t get the chance. Before long they are buried in my inbox and forgotten. I do regret that a little, but I don’t beat myself up over it. (However — I do hereby apologize.)

5. On protecting your time: Regardless of how large or small your audience, eventually people will ask you to do things. Sometimes this can lead to interesting partnerships that advance the interests of both parties, but choose wisely and say no often. Be especially wary of quid pro quo arrangements that involve free stuff. I rarely read newspaper travel writing because I know so much of it is bought and paid for by tour companies, hotels, restaurants and so on, without disclosure. However, I’m less concerned about high-minded integrity than I am about taking on extra burdens. I’m a busy guy, and also a lazy guy who jealously guards his free time, so I’m careful about being obliged to anyone, either contractually or morally. Make sure your agenda is set exclusively by whatever has your full enthusiasm. You want your blogging to be a free activity, where no one but you calls the shots.

6. On the peanut gallery: Keeping up a positive conversation with people who are receptive to your message is productive. Trying to convince skeptics and critics who are never going to agree with you is not. When you’re pushing back, you’re not pushing forward. Keep writing for yourself and the people who want to hear what you’ve got to say, and ignore the rest. This has nothing to do with being nice or avoiding conflict. I don’t care if you’re nice. It’s about applying your energies in a direction where they are likely to produce results. Focus on being positive and enabling others with solutions and knowledge, not on indulging in opinions, fruitless debates, and pointless persiflage among the trolls in the comments section. I haven’t always followed my own advice, but I try.

Some say “know your audience.” Actually, it would be better if you know yourself. Readers respond to your personality and they can only get to know you if you are consistent. You can only be consistent if you are genuine. There are 7.125 billion people in the world and almost half of them have an internet connection (and access to Google Translate). Some of those will become your readers — be true to them by being true to yourself. There is no need to waste your time chasing the crowd.

Your overarching goals are not to convince or convert or market, but to 1) fuel your own growth, and 2) connect with like-minded people. Growth and connection: That’s more than enough payoff for me.

6 October 2014

Don’t worry, just do it

2014-10-03 09.45.37People trying to learn how to do predictive modelling on the job often need only one thing to get them to the next stage: Some reassurance that what they are doing is valid.

Peter Wylie and I are each just back home, having presented at the fall conference of the Illinois chapter of the Association of Professional Researchers for Advancement (APRA-IL), hosted at Loyola University Chicago. (See photos, below!) Following an entertaining and fascinating look at the current and future state of predictive analytics presented by Josh Birkholz of Bentz Whaley Flessner, Peter and I gave a live demo of working with real data in Data Desk, with the assistance of Rush University Medical Center. We also drew names to give away a few copies of our book, Score! Data-Driven Success for Your Advancement Team.

We were impressed by the variety and quality of questions from attendees, in particular those having to do with stumbling blocks and barriers to progress. It was nice to be able to reassure people that when it comes to predictive modelling, some things aren’t worth worrying about.

Messy data, for example. Some databases, particularly those maintained by non higher ed nonprofits, have data integrity issues such as duplicate records. It would be a shame, we said, if data analysis were pushed to the back burner just because of a lack of purity in the data. Yes, work on improving data integrity — but don’t assume that you cannot derive valuable insights right now from your messy data.

And then the practice of predictive modelling itself … Oh, there is so much advice out there on the net, some of it highly technical and involving a hundred different advanced techniques. Anyone trying to learn on their own can get stymied, endlessly questioning whether what they’re doing is okay.

For them, our advice was this: In our field, you create value by ranking constituents according to their likelihood to engage in a behaviour of interest (giving, usually), which guides the spending of scarce resources where they will do the most good. You can accomplish this without the use of complex algorithms or arcane math. In fact, simpler models are often better models.

The workhorse tool for this task is multiple linear regression. A very good stand-in for regression is building a simple score using the techniques outlined in Peter’s book, Data Mining for Fundraisers. Sticking to the basics will work very well. Fussing with technical issues or striving for a high degree of accuracy are distractions that the beginner need not be overly concerned with.

If your shop’s current practice is to pick prospects or other targets by throwing darts, then even the crudest model will be an improvement. In many situations, simply performing better than random will be enough to create value. The bottom line: Just do it. Worry about perfection some other day.

If the decisions are high-stakes, if the model will be relied on to guide the deployment of scarce resources, then insert another step in the process. Go ahead and build the model, but don’t use it. Allow enough time of “business as usual” to elapse. Then, gather fresh examples of people who converted to donors, agreed to a bequest, or made a large gift — whatever the behaviour is you’ve tried to predict — and check their scores:

  • If the chart shows these new stars clustered toward the high end of scores, wonderful. You can go ahead and start using the model.
  • If the result is mixed and sort of random-looking, then examine where it failed. Reexamine each predictor you used in the model. Is the historical data in the predictor correlated with the new behaviour? If it isn’t, then the correlation you observed while building the model may have been spurious and led you astray, and should be excluded. As well, think hard about whether the outcome variable in your model is properly defined: That is, are you targeting for the right behaviour? If you are trying to find good prospects for Planned Giving, for example, your outcome variable should focus on that, and not lifetime giving.

“Don’t worry, just do it” sounds like motivational advice, but it’s more than that. The fact is, there is only so much model validation you can do at the time you create the model. Sure, you can hold out a generous number of cases as a validation sample to test your scores with. But experience will show you that your scores will always pass the validation test just fine — and yet the model may still be worthless.

A holdout sample of data that is contemporaneous with that used to train the model is not the same as real results in the future. A better way to go might be to just use all your data to train the model (no holdout sample), which will result in a better model anyway, especially if you’re trying to predict something relatively uncommon like Planned Giving potential. Then, sit tight and observe how it does in production, or how it would have done in production if it had been deployed.

  1. Observe, learn, tweak, and repeat. Errors are hard to avoid, but they can be discovered.
  2. Trust the process, but verify the results. What you’re doing is probably fine. If it isn’t, you’ll get a chance to find out.
  3. Don’t sweat the small stuff. Make a difference now by sticking to basics and thinking of the big picture. You can continue to delve and explore technical refinements and new methods, if that’s where your interest and aptitude take you. Data analysis and predictive modelling are huge subjects — start where you are, where you can make a difference.

* A heartfelt thank you to APRA-IL and all who made our visit such a pleasure, especially Sabine Schuller (The Rotary Foundation), Katie Ingrao and Viviana Ramirez (Rush University Medical Center), Leigh Peterson Visaya (Loyola University Chicago), Beth Witherspoon (Elmhurst College), and Rodney P. Young, Jr. (DePaul University), who took the photos you see below. (See also: APRA IL Fall Conference Datapalooza.)

Click on any of these for a full-size image.

DSC_0017 DSC_0018 DSC_0026 DSC_0051 DSC_0054 DSC_0060 DSC_0066 DSC_0075 DSC_0076 DSC_0091

Older Posts »

Blog at WordPress.com.