15 December 2009

Why you should use deciles and percentiles for scores

The predictive modeling method I use (multiple regression) results in a “raw score” that is great for very fine ranking, because it will probably produce almost as many score levels as there are individuals in your sample. But it doesn’t work at all for other purposes.

For example, you can’t use ‘raw score’ to observe how a person’s or a group’s propensity to give changes from year to year. Your model changes over time, and so will the output. What does it mean if Joe’s raw score goes from 6349 to 9032? Not much. The value of the score itself has no practical meaning.

Because the values are not easy to explain to end-users, and because they change so much from year to year, you need to provide a more intuitive scoring system.

If you take everyone in the sample and divide them up into groups of roughly equal numbers, by their raw score, you produce a much more useful ranking.

Equal quarters are quartiles, equal fifths are quintiles, and so on. For our needs, equal tenths (deciles) and equal hundredths (percentiles) are the most useful.

For example, if Joe goes from the 60th percentile last year to the 93rd percentile this year, that’s a meaningful change.

But usually we’re not as interested in the score of a single individual as we are in getting a handle on a whole segment of a population. If your annual giving coordinator knows that the 9th and 10th deciles are always where the money is, regardless of how your model changes in any given year, you’ve taken a big step towards clarity. If your results aren’t clear, no one will embrace them.

Which type of predictive score you would use, deciles or percentiles, depends on how selective you need to be:

  • An Annual Giving manager trying to prioritize groups for the Telethon campaign might focus on the top one, two or three deciles. That represents thousands of alumni whose raw propensity-to-give scores place them in the top 10% to 30% of the population.
  • A Planned Giving Officer trying to zero in on the best prospects might focus on no more than the top 1-5% of the population. For that person,  percentiles will provide a much sharper knife.

When I produce a set of scores, I usually provide all three types, because one can’t always anticipate needs. The screen in Banner that holds the scores (APAEXRS) is able to accommodate three ‘flavours’ of scores, so I usually upload raw, decile, and percentile scores for each model.

Unfortunately, the output scores of a regression model are messy! They have to be worked on a bit in order to whip them into shape before you upload them to the database. Here’s what they often look like in their ‘unprocessed’ state:


The Banner field I upload scores to is able to contain four digits. So for ‘raw score’, I create a derived variable in Data Desk that multiplies these values by a thousand and rounds to the nearest whole number.

Deciles and percentiles are not exactly available at the push of a button, either. In future posts I will describe the methods I’ve been taught to produce a nice, clean set of scores for upload.


