CoolData blog

17 December 2010

Chart frequency of words and terms from books with Google

Filed under: Coolness, Free stuff, Fun — Tags: , — kevinmacdonell @ 11:46 am

Here’s another interesting bauble from the nerds at Google. The Books Ngram Viewer allows you to plot the frequency of words and phrases that appeared in books published in the past few hundred years. Google estimates they’ve scanned and OCR’d more than 10 percent of all the books ever published, and this plotter is based on a sample of that data.

This “most excellent time-wasting tool” was blogged about by Alexis Madrigal, a senior editor for TheAtlantic.com, in his post, The Decline of Man (as a Word), in which he shows how the word “man” has fared against “woman”. (Not well.) As Madrigal observes, this may not serve a legitimate research purpose, but it sure is fun.

Here’s a sample. I’ve searched for the term “database”, and set the years to search as 1950 to 2008. The y-axis shows the percentage of all the terms contained in Google’s sample of books written in English that are “database” for those years. As you can see, the word didn’t emerge in published sources before the early 1970s. (Click image for full size.)

The tool also allows you to plot the progress of one term against another. If you plot “database” against “data base”, you’ll see that the two-word term enjoyed a short life before the single word took over. I’ve been interested in the use of the word “gift” instead of “donation,” but the plot of those two words isn’t very informative due, I guess, to the many connotations of the word “gift.” Instead I plotted “charitable gift” and “charitable donation” to put the words in context, and came up with this chart. The concept of giving seems to have had quite a heyday up until around 1835, and “donation” was firmly in the lead. By 1880, though, it was all about the gift. (Click image for full size.)

That got me thinking about how well “philanthropy” has done through the years. Mentions before 1750 are rare, so I plotted from then to the present, and once again the first half the 19th century seems to have been relatively more preoccupied with the idea than later on. (Although, of course, who knows what data this is really based on. As I said, it’s fun, but I wouldn’t want to base a thesis on it without knowing more about the underlying data.)

Hmm – this IS fun. What if we plot poverty vs. religion vs. education? This doesn’t tell us what people were giving to, but it does give a glimpse into what they were writing about. “Poverty” has stayed relatively constant since 1750, but look at how “religion” has declined as “education” has risen. One line crosses the other right at 1909. Also interesting is that the trend started reversing direction about 10 years ago.

And finally, this chart plots “data mining” and two variations of “fundraising“. Data mining takes off as a published term in the early 1990s, and the term “fund raising” has merged into the single word, “fundraising.”

All sorts of fun. Try some for yourself! I’d be interested in hearing about any cool combos you come up with that relate to analytics and/or fundraising.

About these ads

1 Comment »

  1. [...] that appeared in books published in the past few hundred years. (I wrote about it in the post Chart frequency of words and terms from books with Google, 17 Dec 2010.) You can click on the image to go directly to the chart in Google and play with the [...]

    Pingback by “Data” isn’t plural (anymore) « CoolData blog — 20 April 2011 @ 11:42 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,050 other followers

%d bloggers like this: