I rarely post on a Friday, let alone a Friday in the middle of summer, but today’s cool idea is somewhat half-baked. Its very flakiness suits the day and the weather. Actually, I think it has potential, but I’m interested to know what others think.
For those of us in higher-ed fundraising, ‘age’ or ‘class year’ is a key predictor variable. Not everyone has this information in their databases, however. What if you could sort of impute a “best guess” age, based on a piece of data that you do have: First name?
Names go in and out of fashion. You may have played around with this cool tool for visualizing baby-name trends. My own first name, Kevin, peaked in popularity in the 1970s and has been on a downward slide ever since (chart here). I was born in 1969, so that’s pretty close. My father’s name, Leo, has not been popular since the 1920s (he was born in 1930), but is having a slight comeback in recent years (chart here).
As for female names, my mother’s name, Yvonne, never ranked in the top 1,000 in any time period covered by this visualization tool, so I’ll use my niece’s name: Katelyn. She was born in 2005. This chart shows that two common spellings of her name peaked around that year. (The axis labeling is a bit wonky — you’ll have to hover your cursor over the display to get a good read on the timing of the peak.)
You can’t look up every first name one by one, obviously, so you’ll need a data set from another source that relates relative frequencies of names with age data. That sort of thing might be available in census data. But knowing somebody with access to a higher-ed database might be the easiest way.
I’ve performed a query on our database, pulling on just three fields: ID (to ensure I have unique records), First Name, and Age — for more than 87,000 alumni. (Other databases will have only Class Year — we’re fortunate in that we’ve got birth dates for nearly every living alum.) Here are a few sample rows, with ID number blanked out:
From here, it’s a pretty simple matter to copy the data into stats software (or Excel) to compute counts and median ages for each first name. Amazingly, just six first names account for 10% of all living, contactable alumni! (In order: John, David, Michael, Robert, James, and Jennifer.)
On the other hand, a lot of first names are unique in the database, or nearly so. To simplify things a bit, I calculated median ages only for names represented five or more times in the database. These 1,380 first names capture the vast majority of alumni.
The ten “oldest” names in the database are listed in the chart below, in descending order by median age. Have a look at these venerable handles. Of these, only Max has staged a rebound in recent years (according to the Baby Names visualizer).
And here are the ten “youngest names,” in ascending order by median age. It’s an interesting coincidence that the very youngest name is Katelyn — my five-year-old niece. One or two (such as Jake) were popular many years ago, and at least one has flipped gender from male to female (Whitney). Most of the others are new on the scene.
The real test is, do these median ages actually provide reasonable estimates of age for people who aren’t in the database?
I’m not in the database (as an alum). There are 371 Kevins in the database, and their median age is 43. I turned 41 in May, so that’s very good.
My father is also not an alum. The 26 Leos in the database have a median age of 50, which is unfortunately 30 years too young. Let’s call that one a ‘miss’.
My mother’s ‘predicted’ age is off by half that — 15 years — that’s not too bad.
Here’s how my three siblings’ names fare: Angela (predicted 36, actual 39 — very good), Paul (predicted 48, actual 38 — fair), and Francis (predicted 60, actual 36 — poor). Clearly there’s an issue with Francis, which according to the Baby Names chart tool was popular many decades ago but not when my brother was named. In other words, results for individuals may vary!
So let’s say you’re a non-profit without access to age data for your database constituents. How does this help you? Well it doesn’t — not directly. You will need to find a data partner at a university who will prepare a file for you, just as I’ve done above. When you import the data into your model, you can match up records by first name and voila, you’ve got a variable that gives you a rough estimate of age. (Sometimes very rough — but it’s better than nothing.)
This is only an idea of mine. I don’t know if anyone has actually done this, so I’d be interested to hear from others. Here are a few additional thoughts:
- There shouldn’t be any privacy concern — all you want is a list of first names and their median ages, NOT IDs or last names — but be sure to get all necessary approvals.
- To anticipate your question, no, I won’t provide you my own file. I think you’d be much better off getting a names file from a university in your own city or region, which will provide a more accurate reflection of the ethnic flavour of your constituency.
- I used “First name”, but of course universities collect first, middle and last names, and the formal first name might not be the preferred moniker. If the university database has a “Preferred first name” field that is fully populated, that might be a better option for matching with your first-name field.
- Again, there might be more accessible sources of name- or age-related data out there. This idea just sounded fun to try!