CoolData blog

29 May 2014

Nate Silver on age-guessing from first names

Filed under: CoolData — Tags: , , — kevinmacdonell @ 3:22 pm

Friend and colleague Greg Pemberton (@GregPemberton) pointed me to this interesting post on the FiveThirtyEight blog: How to Tell Someone’s Age When All You Know Is Her Name. Wow, I thought … that rings a bell! I wrote a blog post on exactly that topic: How to infer age, when all you have is a name. That was nearly four years ago, and I’ve written a couple more posts on the subject since then.

I’m not suggesting that there’s any borrowing going on. The idea is hardly rocket science and has undoubtedly occurred to many people independently long before I got my noggin around it. So why am I posting this?

Ah.

I am a fan of Nate Silver and his blog. I devoured his book, “The Signal and the Noise,” shortly after it came out. And last year I dragged my butt out of bed in the early morning after an awesome conference reception with multiple open bars to hear him deliver a keynote address. So I was very interested to read his post, co-authored with Allison McCann.

And yes, I may also have been interested in posting a comment in response, with links to CoolData. I am a blogger, after all. So I carefully prepared my comment, and hit ‘Go’. What happened then? A Facebook fail!

facebook_fail

Really, Nate? I need a Facebook account to post a comment? I shut down my Facebook account years ago, for all sorts of reasons, and I don’t plan to go back. (Maybe I shouldn’t criticize. People can’t leave comments on CoolData at all. But Facebook??)

My comment needs a home. Why not right here? Thank you for reading.

I use these age/name/sex patterns to infer likely age in our university database work. We already know the name, gender and age for most people, so we can calculate mean and median ages for all combinations of name and sex, and apply those to any new records that are lacking this data (such as prospective donors). This is helpful, as ‘age’ is strongly correlated with likelihood to make a donation and the size of the gift. Gender can be an important factor … A number of first names have “flipped gender”, so they either belong to a relatively old man or a relatively young female. Examples I know of include Ainslie, Isadore, Sydney, Shelly, and Brooke.

I have written about this a few times:

How to infer age, when all you have is a name

New twists on inferring age from first name

Putting an age-guessing trick to the test

 

POSTSCRIPT

On re-reading the FiveThirtyEight post, I was struck by this passage, which I didn’t notice earlier: “There are quite a lot of websites devoted to tracking the popularity of American baby names over time. … But we haven’t seen anyone ask the age of living Americans with a given name.”

Oh really. … Let me Google that for you.

 

POST-POSTSCRIPT

When I first posted the “Let me Google that for you” link, CoolData was at the top of the results. It has since been crowded out by FiveThirtyEight and others. The benefits of a large web presence and the resources to optimize search results.

 

POST-POST-POSTSCRIPT

One thing was puzzling me … In my stats, I have seen a lot of people clicking on the link to FiveThirtyEight from my blog, but I also noticed that almost 200 people (to date) have come to CoolData from FiveThirtyEight. I couldn’t figure out how — there was no link to CoolData that I could find. Well, I’ve found it. CoolData is referenced in the first footnote below the FiveThirtyEight post on age-guessing from names. One has to click the plus sign in the circle (before the comments) to see the footnotes. So — thanks, Nate!

variations

23 December 2013

New from CASE Books: Score!

Filed under: Book, CoolData, Peter Wylie — Tags: , , , — kevinmacdonell @ 9:39 am

CASE_coverAs the year draws to a close, I’m pleased to announce that the book I’ve co-written with Peter Wylie will be available in January. ‘Score!’ joins a host of fine publications in CASE’s new catalog. I’m looking forward to having a look through this catalog for new books for the office. (‘Score’ is featured on page 12.)

So what is this new book about? The full title is Score!: Data-Driven Success for Your Advancement Team, and as a recent of issue of BriefCASE notes: “Kevin MacDonell and Peter Wylie walk readers through compelling arguments for why an organization should adopt data-driven decision-making as well as explanations of basic issues such as identifying and mining the pertinent data and what operations to perform once that data is in hand.”

You can read the rest of that article here: Ready to Score!?

27 December 2012

Holiday indulgence

Filed under: Book, CoolData, Off on a tangent — Tags: , — kevinmacdonell @ 4:35 pm

I’ve always tried to stay on-topic with CoolData content: If you subscribe, you know what you’re getting, and if you lose interest and unsubscribe, you know what you’re missing. But I’m on holiday, so I’m inclined to let content rules slip a bit. My wife and I are spending time with family on Cape Breton Island and in the Annapolis Valley in Nova Scotia. I’m less vigilant than usual about what I eat (more turkey, more sweets, more wine) and what I do (nothing, essentially). It is in this state of desuetude that I write this last blog post of the year.

Allow me to indulge by writing not about predictive analytics, but about CoolData itself, which has just turned three years old. That’s middle age for a blog, I figure. First I’ll go through some numbers, and then I’ll tell you about some things coming in the new year.

CoolData by the numbers

As of yesterday, CoolData has had 177,915 page views since it was launched. The number of visitors continues to grow gradually; 6,000 page views a month is the current average. These are page views, not unique visitors: WordPress has been informing me about unique visits only since early December. So far, each unique visitor averages 1.4 page views.

Visits have come from almost every country in the world, but of course most are from the United States. It is not unusual for my own country, Canada, to be edged out of second place by the UK, India or Australia on any given day. The top 20 or so countries since February 2012 are included in the WordPress-created graphic below. (Click for full size.)

countries

These visitor numbers are not small, but I’m not pretending they’re impressive, either. My subject is rather niche. As well, many visitors aren’t really looking for CoolData. Half of my traffic comes from people stumbling in from Google and other search engines, and they’re looking for simple (or simplistic) explanations of statistical concepts. The most popular post by far is How high, R squared? — published in April 2010, it is still heavily visited every day by confused and desperate grad students from all corners of the globe. I don’t consider these people part of the CoolData “tribe”, if I can call it that.

The tribe — the readers I care most about — are typically the ones who have subscribed to receive updates. (There are also a lot of RSS subscribers — I don’t have as good a handle on those numbers.*) As of today, there are 680 subscribers — 48 subscribers via WordPress accounts, and 632 via email. This number has been growing very gradually over the past three years. I realize many people sign up for things they never return to (I do it all the time), but when an update goes out, I estimate that about half of my subscribers click through to the new post, which I find encouraging. They are far more likely to click through than my followers on Twitter (@kevinmacdonell).

Most readers visit during the work week (readership drops off dramatically on weekends), so not surprisingly most subscribers use their real work address rather than a free Gmail, Hotmail, or Yahoo account. From my own research, I know providing a work email is associated with higher levels of engagement, and “.edu” addresses alone (US-affiliated higher ed institutions) account for 293 subscribers. Another 101 addresses have the less restrictive top-level domain of “.org”. Among country-specific top-level domains, the top ones are Canada (.ca) with 46 and the United Kingdom (.uk) with 29. There are 142 “.com” addresses, and roughly half of them are Gmail, Yahoo or Hotmail. There are 443 unique domains in all, the top ones being uw.edu (University of Washington) and ubc.ca (University of British Columbia).

Start writing!

Up to now I’ve been coy about answering questions about my stats, for no real reason. I figure I might as well come clean. I have long felt that there is more room for writing on this topic, so if knowing more about my readership encourages you to start your own blog, then I encourage you to make 2013 your year to step up. All it takes is a few minutes to sign up on WordPress or similar free service, and start writing.

If you’re not up for creating your own blog, then consider writing a guest post for CoolData. Up to this point, guest posting has been by invitation only, but starting today I am open to receiving post ideas from anyone interested in writing on the topic of predictive analytics for nonprofit fundraising or higher education advancement (including alumni engagement). I plan to limit submitted guest posts to one per month. Multiple submissions are welcome, but submissions that are completely off-topic will not get a response. Email me at kevin.macdonell@gmail.com to suggest/discuss your idea before you start writing.

No more comments

As I begin a new year, naturally I think of changes I’d like to make. For one, I will be taking a new approach to comments on posts. Only 514 comments have been contributed since December 2009, and 140 of those are mine. This is not a disappointment — I had no designs one way or the other — but the time has come to recognize the fact that CoolData has never been effective as a discussion forum. There have been a few good questions and observations made by commenters, but unfortunately too many comments are of the “drive-by” variety: Brief one-off criticisms that require rebuttal but never lead to any forward advance in the discussion or added enlightenment for beginning predictive modelers. The best questions, the most honest comments, and the most well-reasoned objections tend to come to me via private email.

For that reason, I am shutting off the ability to respond with public comments. There have been no nasty personal attacks, nor abusive language, nor anything I’ve felt forced to delete (aside from spam). I simply feel that, after three years of writing and editing this blog, I no longer feel the need to provide a platform for people whose main interest is something other than being part of a shared endeavour to learn, to grow, and to bring our institutions and organizations into the age of data. Responses, questions, critiques are always welcome via private email, and I may choose to gather the best responses for use in followup blog posts. Keep in mind, too, that the best forums for discussion are still the listservs (Prospect-dmm is the best example), and new conversations crop up every week in the many groups of interest you can find on social networking sites such as LinkedIn.

SCORE!

On a more positive note, 2013 will be the year that a new book, Score!, which I have co-written with Peter Wylie, will be published. I’ve said very little about it to date, in part because I won’t actually believe it until it’s in my hands. It’s a project with a long gestation … writing a book has nearly nothing in common with knocking off a blog post. However, I’m confident we’ll see it out sometime during the first half of the year.

That’s all for 2012. Best of luck in your data-related work in 2013!

{}{}{}{}{}{}{}{}

* A regular reader who subscribes via RSS reminded me that I have given short shrift to the RSS crowd — I just don’t know how many subscribe via RSS. It is quite possible, then, that I am overestimating the number of email subscribers who click through to the post.

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,086 other followers