{"id":5519,"date":"2010-12-22T16:22:40","date_gmt":"2010-12-22T08:22:40","guid":{"rendered":"http:\/\/www.intmath.com\/blog\/?p=5519"},"modified":"2014-11-16T20:35:54","modified_gmt":"2014-11-16T12:35:54","slug":"500-billion-words-visual-stats-give-us-cultural-insights","status":"publish","type":"post","link":"https:\/\/www.intmath.com\/blog\/computers\/500-billion-words-visual-stats-give-us-cultural-insights-5519","title":{"rendered":"500 billion words: visual stats give us cultural insights"},"content":{"rendered":"<p>Google has scanned over 15 million books, and they've released a subset containing 500 billion words from 5.2 million books. The best thing is htey have made the whole database freely available.<\/p>\n<p>You can search the <a href=\"https:\/\/books.google.com\/ngrams\/\">Google Books Ngram Viewer<\/a> to see how words have changed in popularity through the years.<\/p>\n<h2>Cultural intelligence<\/h2>\n<div class=\"imgRt\" style=\"width:175px\"><img loading=\"lazy\" src=\"\/blog\/wp-content\/images\/2010\/12\/men-women-comparison.png\" alt=\"men-women-comparison\" title=\"men-women-comparison\" width=\"173\" height=\"184\"  \/><\/div>\n<p>The graph shows how the word \"men\" dominated books up until the feminist 1970s, when the word \"women\" started to take over. By 1985, you can see \"women\" were on top. [Image <a href=\"http:\/\/www.nytimes.com\/2010\/12\/17\/books\/17words.html\">source<\/a>]<\/p>\n<p>However, when you <a href=\"https:\/\/books.google.com\/ngrams\/graph?content=man,woman&year_start=1800&year_end=2000&corpus=0&smoothing=3\">compare \"man\" and \"woman\"<\/a>, the fairer sex has a long way still before dominating mentions in books.<\/p>\n<h2>A tale of 3 empires<\/h2>\n<p>Here is a <a href=\"https:\/\/books.google.com\/ngrams\/graph?content=America,England,China&year_start=1900&year_end=2000&corpus=0&smoothing=3\">comparison of the words \"America\", \"England\" and \"China\"<\/a> as used in English-speaking books throughout the 20th century. It shows how England's influence waned, America's ascended (especially during the war years) and how China's has remained fairly constant since the 1940s. <\/p>\n<p><img loading=\"lazy\" src=\"\/blog\/wp-content\/images\/2010\/12\/america-england-china-sm2.png\" alt=\"Ameerica England China comparison\" title=\"Ameerica England China comparison\" width=\"490\" height=\"300\"  \/><\/p>\n<p>Notice that searches on the Ngram Viewer are <b>case sensitive<\/b>. I originally (mistakenly) compared <a href=\"https:\/\/books.google.com\/ngrams\/graph?content=china,england,america&year_start=1800&year_end=2000&corpus=0&smoothing=3\">\"china\", \"america\" and \"england\"<\/a> (all lower-case) and of course, \"china\" was at the top by a significant margin, since this refers to the pottery.<\/p>\n<h2>Come caveats<\/h2>\n<p>While these searches are really interesting, we need to ask:<\/p>\n<ul>\n<li>Which books are included, and which are not included yet?<\/li>\n<li>Which books are not included due to copyright issues?<\/li>\n<li>The data doesn't appear that useful for the naughties (2000 to 2010). Many terms seem to decrease in importance, or increase inexplicably.<\/li>\n<\/ul>\n<h2>Others to try<\/h2>\n<ul>\n<li><a href=\"https:\/\/books.google.com\/ngrams\/graph?content=environment&year_start=1900&year_end=2000&corpus=0&smoothing=3\">environment<\/a><\/li>\n<li><a href=\"https:\/\/books.google.com\/ngrams\/graph?content=global+warming&year_start=1900&year_end=2000&corpus=0&smoothing=3\">global warming<\/a><\/li>\n<li><a href=\"https:\/\/books.google.com\/ngrams\/graph?content=man,woman&year_start=1900&year_end=2000&corpus=0&smoothing=3\">man,woman<\/a> (\"woman\" has still got a way to go to catch up, but note how dramatically \"man\" has dropped)<\/li>\n<li><a href=\"https:\/\/books.google.com\/ngrams\/graph?content=television,tv,movies,books,internet,Web&year_start=1900&year_end=2005&corpus=0&smoothing=3\">television,tv,movies,books,internet,Web<\/a><\/li>\n<\/ul>\n<p class=\"alt\"><a href=\"#respond\" id=\"comms\">Be the first to comment<\/a> below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><a href=\"https:\/\/www.intmath.com\/blog\/computers\/500-billion-words-visual-stats-give-us-cultural-insights-5519\"><img loading=\"lazy\" src=\"\/blog\/wp-content\/images\/2010\/12\/men-women-comparison_th.png\" alt=\"men-women-comparison\" title=\"men-women-comparison\" width=\"128\" height=\"100\" class=\"imgRt\" \/><\/a>Here's some interesting \"real-life\" visual statistics - especially good for those who love text more than math.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mo_disable_npp":""},"categories":[1],"tags":[125,131],"_links":{"self":[{"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/posts\/5519"}],"collection":[{"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/comments?post=5519"}],"version-history":[{"count":0,"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/posts\/5519\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/media?parent=5519"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/categories?post=5519"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.intmath.com\/blog\/wp-json\/wp\/v2\/tags?post=5519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}