Google search, math and latent semantic analysis

By Murray Bourne, 13 Jul 2008

Google has become the dominant search engine because of its relevance and efficiency. Relevance is achieved through its propriety PageRank algorithm, which determines which pages are the most likely to satisfy your search query. Efficiency is achieved by using thousands of PCs rather than big servers to hold all the indexing, document and media information.

I wrote about this a while back in Math that made Google rich.

Now let's move on to an aspect of matrices that search engines use, called latent semantic analysis.

Here's what Wikipedia has to say on the subject:

Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA can use a term-document matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents, typically stemmed words that appear in the documents.

Let's put this in everyday language. Simply put, latent semantic indexing is something the search engines do when they analyze the content of a Web site in order to figure out what the site is about.

It's actually what we humans do every day of our lives — try to figure out the meaning in what we see, hear and feel.

That Wikipedia article delves into the matrix operations that are involved in latent semantic analysis.

(If you are a bit rusty, see an Introduction to Matrices.)

See the 1 Comment below.

One Comment on “Google search, math and latent semantic analysis”

  1. khudhair says:

    Many thanks for yourselves
    I am a researcher and interested in measuring the similarity of the text and type currently LSA and I hope to take advantage of your experiences

Leave a comment

Comment Preview

HTML: You can use simple tags like <b>, <a href="...">, etc.

To enter math, you can can either:

  1. Use simple calculator-like input in the following format (surround your math in backticks, or qq on tablet or phone):
    `a^2 = sqrt(b^2 + c^2)`
    (See more on ASCIIMath syntax); or
  2. Use simple LaTeX in the following format. Surround your math with \( and \).
    \( \int g dx = \sqrt{\frac{a}{b}} \)
    (This is standard simple LaTeX.)

NOTE: You can't mix both types of math entry in your comment.

Search IntMath, blog and Forum