CoolData blog

10 October 2012

Logistic vs. multiple regression: Our response to comments

Guest post by John Sammis and Peter B. Wylie

Thanks to all of you who read and commented on our recent paper comparing logistic regression with multiple regression. We were not sure how popular this topic would be, but Kevin told us that interest was high, and there were a number of comments and questions. There were several general themes in the comments; Kevin has done an excellent job responding, but we thought we should throw in our two cents.

Why not just use logistic?

The point of our paper was not to suggest that logistic regression should not be used — our point was that multiple regression can achieve prediction results quite similar to logistic regression. Based on our experience working with and training fundraising professionals getting introduced to analytics, logistic regression can be intimidating. Our goal is always to get these folks to use analytics to help with their fundraising initiatives. We find many of them catch on with multiple regression, and much less so with logistic regression.

Predicted values vs. probabilities

We understand that the predicted values generated by multiple regression are different from the probabilities generated by logistic regression. Regardless of the statistic modeling technique we use, we always bin the raw prediction or probability values into equal-sized score levels. We have found that score level bins are easier to use than raw values. And using equal-sized score levels allows for easier evaluation of the scoring model.

“I cannot agree”

Some commenters, knowledgeable about statistics, said they would not use multiple regression when the inputs called for logistic. According to the rules, if the target variable is binary, then linear modelling doesn’t make sense — and the rules must be obeyed. In our view, this rigid approach to method selection is inappropriate for predictive modelling. The use of multiple linear regression in place of logistic regression may not always make theoretical sense, but predictive modellers are concerned with whether or not a model produces an output that is useful in practical terms. The worth of a model is testable against new, real-world data, therefore a model has only one criterion for determining “appropriate” use: Whether it really predicts what the modeler claims it will predict. The truth is revealed during evaluation.

A modest proposal

No one reading this should simply take our word that these two dissimilar methods yield similar results. Neither should anyone dismiss it out of hand without providing a critique based on real data. We would encourage anyone to try doing something on your own with data using both techniques and show us what you find. In particular, graduate students looking for a thesis or dissertation topic might consider producing something under this title: “Comparing Logistic Regression and Multiple Regression as Techniques for Predicting Major Giving.”

Heck! Peter says that if anyone were interested in doing a study like this for a thesis or dissertation, he would be willing to offer advice on how to:

  1. Do a thorough literature review
  2. Formulate specific research questions
  3. Come up with a study design
  4. Prepare a proposal that would satisfy a thesis or dissertation committee.

That’s quite an offer. How about it?

2 Comments »

  1. I actually use a whole lot of methods on my way to building a model. It’s only when I’m trying to figure out ask amounts is it important for me to use linear regression. When the dependent variable is binary and some of the independent variables are categorical, I use logistic. Once I get into more than 2 dependent variable categories, I’m heading over to decision trees. When do you use cluster analysis?

    Comment by Marianne Pelletier — 10 October 2012 @ 9:38 am

  2. Hi Kevin,

    Would you be willing to offer any suggestions on courses to take to learn about predictive modeling in fundraising (or generally) either locally (here in Halifax) or elsewhere? My background is in data reporting, but all basically summarizing past data. I have no background in statistics, so perhaps I’d even need a whole course in basic statistics.

    Any insight is appreciated!

    Thank you!

    Best regards,
    Matt

    Comment by Matt — 15 October 2012 @ 8:01 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: