In January, the White House released a review of the incident in which a would-be bomber nearly destroyed a passenger jet in flight on Christmas Day. Why did anti-terrorism officials fail to identify and counter this threat? According to the report, part of the problem was in the databases, and in the data-mining software: “Information technology within the CT [counterterrorism] community did not sufficiently enable the correlation of data that would have enabled analysts to highlight the relevant threat information.”
I’ve just finished reading Stephen Baker’s book, The Numerati, published in 2008. In a chapter called simply “Terrorist”, he observes that it’s nearly impossible to build a predictive model of “rare or unprecedented events,” citing the few cataclysmic examples that we all know about. “This is because math-based predictions rely on patterns of past behaviour,” he writes.
Known and suspected terrorists are presumably the needle in a huge haystack that includes you, me, and everyone else in the world. Terrorists are practically invisible in such a sea of identities, they work hard at avoiding detection, and they trigger events that may never have happened before.
Not to trivialize the subject, but while reading this it struck me that some of the models we build in the more prosaic world of fundraising are in the related business of modeling for rare events. I’m thinking primarily of Major Gifts and Planned Giving. As tricky as this sort of prediction is, we can be thankful for three things: The events we are trying to predict are rare but not unprecedented, the data set has precise limits, and the stakes are not nearly as high.
Here is a basic tip for improving the power of a Planned Giving model. My first attempt at a PG model included the full data set of alumni, from the oldest alum right up to the Class of 2009. We had a limited number of people in the database indentified as existing PG commitments, and they were swimming in that ocean of data. I took a number of steps to improve the model, but the most obvious was to exclude all the younger alumni. They would not normally be considered PG prospects, and eliminating them boosted the ratio of PG commitments to the general population.
Look at your existing commitments, identify who the youngest is (by class year, probably), and exclude all the alumni who are younger than that. (Use a selector variable in Data Desk right in your regression table, if that’s what you’re doing.) If your youngest is an outlier, then pick the second-youngest as your cutoff – but don’t eliminate the outlier individual, because you need all the historical data you can get!