# Google search, math and latent semantic analysis

By Murray Bourne, 13 Jul 2008

Google has become the dominant search engine because of its relevance and efficiency. Relevance is achieved through its propriety PageRank algorithm, which determines which pages are the most likely to satisfy your search query. Efficiency is achieved by using thousands of PCs rather than big servers to hold all the indexing, document and media information.

I wrote about this a while back in Math that made Google rich.

Now let's move on to an aspect of matrices that search engines use, called **latent semantic analysis**.

Here's what Wikipedia has to say on the subject:

Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA can use a term-document matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents, typically stemmed words that appear in the documents.

Let's put this in everyday language. Simply put, latent semantic indexing is something the search engines do when they analyze the content of a Web site in order to figure out what the site is about.

It's actually what we humans do every day of our lives — try to figure out the meaning in what we see, hear and feel.

That Wikipedia article delves into the matrix operations that are involved in latent semantic analysis.

(If you are a bit rusty, see an Introduction to Matrices.)

See the 1 Comment below.

21 Sep 2014 at 5:26 am [Comment permalink]

Many thanks for yourselves

I am a researcher and interested in measuring the similarity of the text and type currently LSA and I hope to take advantage of your experiences