CoolData blog

3 January 2016

CoolData (the book) beta testers needed


UPDATE (Jan 5): 16 people have responded to my call for volunteers, so I am going to close this off now. I have been in touch with each person who has emailed me, and I will be making a final selection within a few days. Thank you to everyone who considered taking a crack at it.


Interested in being a guinea pig for my new handbook on predictive modelling? I’m looking for someone (two or three people, max) to read and work through the draft of “CoolData” (the book), to help me make it better.


What’s it about? This long subtitle says it all: “A how-to guide for predictive modelling for higher education advancement and nonprofits using multiple linear regression in Data Desk.”


The ideal beta tester is someone who:


  • has read or heard about predictive modelling and understands what it’s for, but has never done it and is keen to learn. (Statistical concepts are introduced only when and if they are needed – no prior stats knowledge is required. I’m looking for beginners, but beginners who aren’t afraid of a challenge.);
  • tends to learn independently, particularly using books and manuals to work through examples, either in addition to training or completely on one’s own;
  • does not have an IT background but has some IT support at his or her organization, and would not be afraid to learn a little SQL in order to query a database him- or herself, and
  • has a copy of Data Desk, or intends to purchase Data Desk. (Available for PC or Mac).


It’s not terribly important that you work in the higher ed or nonprofit world — any type of data will do — but the book is strictly about multiple linear regression and the stats software Data Desk. The methods outlined in the book can be extended to any software package (multiple linear regression is the same everywhere), but because the prescribed steps refer specifically to Data Desk, I need someone to actually go through the motions in that specific package.


Think of a cookbook full of recipes, and how each must be tested in real kitchens before the book can go to press. Are all the needed ingredients listed? Has the method been clearly described? Are there steps that don’t make sense? I want to know where a reader is likely to get lost so that I can fix those sections. In other words, this is about more than just zapping typos.


I might be asking a lot. You or your organization will be expected to invest some money (for the software, sales of which I do not benefit from, by the way) and your time (in working through some 200 pages).


As a return on your investment, however, you should expect to learn how to build a predictive model. You will receive a printed copy of the current draft (electronic versions are not available yet), along with a sample data file to work through the exercises. You will also receive a free copy of the final published version, with an acknowledgement of your work.


One unusual aspect of the book is that a large chunk of it is devoted to learning how to extract data from a database (using SQL), as well as cleaning it and preparing the data for analysis. This is in recognition of the fact that data preparation accounts for the majority of time spent on any analysis project. It is not mandatory that you learn to write queries in SQL yourself, but simply knowing which aspects of data preparation can be dealt with at the database query level can speed your work considerably. I’ve tried to keep the sections about data extraction as non-technical as possible, and augmented with clear examples.


For a sense of the flavour of the book, I suggest you read these excerpts carefully: Exploring associations between variables and Testing associations between two categorical variables.


Contact me at and tell me why you’re interested in taking part.





Blog at

%d bloggers like this: