Alexa web statistics - say what?
By Murray Bourne, 25 Jan 2007
As part of the monitoring that I do on the intmath.com site, I get the Alexa ranking included in the monthly uptime report. The monitoring report explains:
Alexa Ranking: The Alexa information is the collective data of millions of Alexa toolbar users which is used to derive comparative website traffic statistics for ranking, number of visitors and page views. Use this data to measure your websites ranking performance over time.
Rank: a combined measure of page views and visitors (reach).
Reach: Number of visitors.
Page Views: Number of pages viewed during a visit.
Okay, not too bad, but begs a few questions. How is the "Rank" calculated?
Now, let's see what it says.
Alexa stats for: www.intmath.com
Time Range 3 month Rank 283,846 Rank Change -10,951 Reach 2.5 Reach Change +22% Page Views 0.27 Page Views Change -10%
Rank (283,846) - we still don't know where this comes from. Is this number good or bad? "Rank highly" generally means "good", but in this case, what does the high number mean?
Rank Change (-10,951) - it is not clear whether this is an improvement in readership or a drop.
Reach (2.5) - what? Only 2.5 visitors? In 3 months? Nope, several thousand would be closer to the mark.
Reach Change (+22%) - okay, I guess. Does it mean compared to the previous 3 month's block of data?
Page Views (0.27) - what? Each visitor only looked at just over a quarter of a page? What did they do - hit the back button as the page was loading? Actually, the average number of pageviews is around 3.5 per visitor (according to Google Analytics).
Page Views Change (-10%) - fine, I guess.
Seems to me they have mixed up "Reach" and "Page Views".
More Information Required
So I trotted over to the Alexa site to see what they had to say about their statistics. They have:
Alexa computes the reach [the number of users] and number of page views [number of pages viewed by Alexa Toolbar users] for all sites on the Web on a daily basis. The main Alexa traffic rank is based on the geometric mean of these two quantities averaged over time (so that the rank of a site reflects both the number of users who visit that site as well as the number of pages on the site viewed by those users).
The geometric mean is not the same as the average that we all commonly use (take all the data points, add them up, divide by the number of data points). The geometric mean of 2 values is the square root of the product of the 2 values.
So if I have, say, 200 visitors and they look at 3 pages each, the geometric mean is √(200 × 3) = 24.49. So the more visitors I have and the more pages they read, the Alexa Rank should go up, right? But no, it turns out that the lower the Rank number, the more popular the site. (The statement does say it is "based on". Seems that the higher the geometric mean, indicatng more readers and more pages, the lower the rank. Fair enough.)
Google and Yahoo Alexa Ranks
I used Alexa's Traffic Analysis tool to compare the super-popular Yahoo and Google sites.
Turns out that Yahoo has Rank 1 and Google has (mostly) Rank 3. Fine.
The Daily Reach for both of them is around 280,000 "per million", which means out of each million randomly selected web users, 280,000 of them will be Yahoo and another 280,000 will be Google. That seems to have "used up" over half of all Web users, with just 2 sites. I don't think so, or perhaps I am missing something.
The Daily Page Views are given as around 25,000 for Google and around 60,000 for Yahoo. These are "per million" once again. This doesn't compute. If there are 280,000 people "per million" viewing Google and only 25,000 page views "per million", how is this possible? Elsewhere on the same page, they have "Page views per user" for Google as 6.8. That's fine and that makes sense.
Curious... Any insights from anyone would be welcome.
Footnote 1: Who cares? Such web analytics are crucial for e-commerce sites. They are keen to get lots of users and they want those users to view lots of pages. Sites that get very good rankings for visitors and viewership become attractive as takeover targets. That's how YouTube got to be worth $1.6 billion. Current Rank: 5.
Footnote 2: The Importance of Displaying Data Clearly I am very comfortable with statistics and graph reading, yet I find the above stats quite confusing. Whenever stats like this are displayed, there should be an indication (maybe colour-coded) that indicates "high is best" or "a decrease is good", etc.
Footnote 3: Comparison with intmath.com The Alexa tool shows very similar rankings and page views for Interactive Mathematics as for this blog. But the number of users is around 10 times more for the math site. All very strange...
See the 3 Comments below.
29 Jan 2007 at 4:43 pm [Comment permalink]
[quote]The Daily Reach for both of them is around 280,000 “per million”, which means out of each million randomly selected web users, 280,000 of them will be Yahoo and another 280,000 will be Google. That seems to have “used up” over half of all Web users, with just 2 sites. I don’t think so, or perhaps I am missing something.[/quote]
I believe that they mean 280k/million but not 280k that don’t visit another site. Thus, it could be the same 280k.
That said, and revisiting your log/semi-log curves. The distribution of market share [various measures would work] of various segments looks like your semi-log or log-log [depends] examples.
29 Jan 2007 at 12:48 pm [Comment permalink]
Obviously, "rank" is where you come in the Web's pecking order. Rank #1 is Yahoo (they get the most visitors to the most pages) and Rank #283,846 means you've got a ways to go before catching up.
As for the geometric mean, I think they really are talking about the time series of values, not the product of the number of visitors times the number of pages, as you have put.
According to Buzzards Bay:
So it sounds like Alexa is finding the geometric mean of several days' worth of stats, which can vary wildly.
I agree with you that there are some confusing definitions in there.
30 Jan 2007 at 3:23 am [Comment permalink]
Thanks for your inputs, Peter and Moti.
This proves something that I have been thinking about lately. Just the act of writing down the problem helps to solve it, or at least helps to make it clearer.
I find this when I have a programming problem. It has happened a few times that I have started to pose my question in some online forum, and in the process of expressing the problem clearly, it has helped me to see the solution.
I attended a talk once that was on "Problem posing in mathematics". The teacher got the students to pose their own problems (not answer the teacher's problems). The act of thinking through and posing an intelligent problem that "worked" was a valuable learning experience in itself.