500 billion words: visual stats give us cultural insights

By Murray Bourne, 22 Dec 2010

Google has scanned over 15 million books, and they've released a subset containing 500 billion words from 5.2 million books. The best thing is htey have made the whole database freely available.

You can search the Google Books Ngram Viewer to see how words have changed in popularity through the years.

Cultural intelligence

men-women-comparison

The graph shows how the word "men" dominated books up until the feminist 1970s, when the word "women" started to take over. By 1985, you can see "women" were on top. [Image source]

However, when you compare "man" and "woman", the fairer sex has a long way still before dominating mentions in books.

A tale of 3 empires

Here is a comparison of the words "America", "England" and "China" as used in English-speaking books throughout the 20th century. It shows how England's influence waned, America's ascended (especially during the war years) and how China's has remained fairly constant since the 1940s.

Ameerica England China comparison

Notice that searches on the Ngram Viewer are case sensitive. I originally (mistakenly) compared "china", "america" and "england" (all lower-case) and of course, "china" was at the top by a significant margin, since this refers to the pottery.

Come caveats

While these searches are really interesting, we need to ask:

  • Which books are included, and which are not included yet?
  • Which books are not included due to copyright issues?
  • The data doesn't appear that useful for the naughties (2000 to 2010). Many terms seem to decrease in importance, or increase inexplicably.

Others to try

Be the first to comment below.

Leave a comment


Comment Preview

HTML: You can use simple tags like <b>, <a href="...">, etc.

To enter math, you can can either:

  1. Use simple calculator-like input in the following format (surround your math in backticks, or qq on tablet or phone):
    `a^2 = sqrt(b^2 + c^2)`
    (See more on ASCIIMath syntax); or
  2. Use simple LaTeX in the following format. Surround your math with \( and \).
    \( \int g dx = \sqrt{\frac{a}{b}} \)
    (This is standard simple LaTeX.)

NOTE: You can't mix both types of math entry in your comment.

Search IntMath, blog and Forum