Recently, Net Applications updated their methodology to weight their browser share by the Internet population of the countries they monitor. This had the effect of knocking Apple's Safari down quite a bit, presumably due to Apple's weak numbers outside of the U.S. and of lifting Opera up some, presumably due to Opera's stronger showing in Europe an Asia.
As of August 1st, we have implemented retroactive country-level weighting in our reports. This means that we adjust our reports proportionally based on how much traffic we record from a country vs. how many internet users that country has. For example, although we have significant data from China, it is relatively small compared to the number of internet users in China. Therefore, we now weight Chinese traffic proportionally higher in our global reports. This change produces a much more accurate view of worldwide usage share statistics.
That sounds like a change for the better, but it also seems to have had some unfortunate side effects. Net Applications obviously does not have equally good sampling in all of the countries it monitors. Where that sample is weak and likely to produce unrepresentative results, the effect can be either magnified or diminished depending on the Internet population of that country or region.
Before the new methodology, presumably, they just reported raw data -- their weighting was tied directly to the strength (size) of their sample. So if they had a relatively small (not likely to be very representative) sample in a particular country, that less good sample had less of an impact on their overall numbers.
Now, though, crazier and unrepresentative numbers in large Internet population countries have a quite dramatic effect on the overall global share reported by Net Applications.
Here's a good example:
Several weeks ago, I saw an odd spike in the Netscape usage share. Netscape hasn't shipped a new browser in ages and their global share has been pretty steady at around half a percentage point for as long as I can remember. Then, for the week of 07/19, their share shot up to over 1%. A doubling of share seemed a bit odd for a browser that's been out of circulation as long as Netscape but when you're dealing with half a percent or less, it's not unreasonable to imagine that it wasn't growth of Netscape so much as a slow week of usage for all the other browsers. I could sort of picture a situation where modern browser users as a cohort all did a bit less browsing for a few days while ancient browser users were unaffected. I didn't think too much about it but it did catch my attention.
Well, this week's numbers just came out and Netscape is showing a global share of almost 4% !!!
Here's what the trends look like for the last couple of months of global share.

My first thought was "this can't be right" so I looked at the U.S. share (subscribers only) and it looked stable and steady with Netscape well under 0.05% for years. So I turned next to share by continent where Asia showed a big Netscape spike. I drilled down a bit further and looked at just China browser share.
Here's what the trends look like for the last couple of months of China share.

So what's going on here? Well, it could be one of a couple of things. First, it could be some kind of spider that identifies itself as "Netscape 6.0" that's crawling the Chinese Web for search engine indexing or something like that. That's something Net Applications could dig into and if it is a spider, just add it to their list of not-counted hits. All competent stats packages can exclude that kind of traffic from their metrics.
Second, and potentially more problematic, Net Applications' sample in China, those Websites that have deployed the Net Applications site analytics package, could be just so few that it only takes a trivial number of site visitors switching browsers to have a very large impact on their measurements.
Either way, this kind of error now has a much larger impact when it happens in a country like China which happens to have the largest Internet using population of any country in the world.
I don't think there's really any good solution to dealing with small sample sizes and any commercial analytics package is going to suffer from that problem. Perhaps a second weighting based on sample size would help in reflecting more accurately the actual data, but that doesn't help us understand Internet populations any better.
I blogged a few months ago about similarly disturbing spikes in IE 6 usage in the metrics reported by StatCounter. I can only conclude that these providers simply don't have a good enough sample to describe global internet populations.

What we really need is measurements from organizations that have much more representative usage and there are only a few that I can think of. That cold come in three forms, as I see it. One, we could find the top measures for every locale and build ourselves a global picture from the bottom up. Two, we could look to a few heavy-weights for large regions (Google would obviously be really good for much of the planet, and combined with local powerhouses like Yandex in Russia and Baidu in China, we could probably get a pretty good global measure.) Or three, we could find one source that had solid global representation.
Ben Chuang, in my previous post on this topic had this to say:
I would also suggest that we don't need data from the absolute largest site in the universe, we just need data from a very-large-site-that-is-very representative. What about a more "open" site, like Wikipedia?
I think that's actually a really good suggestion. Wikipedia has articles in more than 250 languages and is regarded as the online encyclopedic authority by most nationalities. Wikipedia also has a huge amount of traffic, billions of visits every month, so it's not as likely to be swayed by the occasional odd visitor patterns.
So what do you all think? Would Wikipedia's browser breakdown be a better measure than the various analytics providers that we've all been using for the last five or six years?
Posted by: Daren | August 23, 2009 1:58 PM
Daren, I think the question really is "are Wikipedia users a _better_ representation of the Internet population as a whole than visitors of companies which use the Net Applications and StatCounter analytics packages?" If so, then it's an improvement in our picture of the Internet to look at Wikipedia stats rather than Net Applications and StatCounter stats.
- A
Posted by: Asa Dotzler | August 23, 2009 3:56 PM
I think Wikipedia would be a good source. Because a lot of it's visitors (I would guess) are not people that go there to search for something, but rather they get there through a search engine. I know that happens to me a lot. So their audience makes up a lot more than just loyal wikipedia users.
Posted by: Jacob Munson | August 23, 2009 4:38 PM
I don't think Wikipedia is a good source (if used exclusively) for internet usage stats. We need something that's not only representative for different nationalities, but also representative for all kinds of demographics.
Wikipedia may represent students and professionals doing research, but it doesn't represent the common company workers, casual news readers, etc. So the result would not be representative for the internet users as a whole at all.
Wikipedia is an encyclopedia, that means it's heavily biased towards academic people, which could be worse than the stats from big companies like those of Net Applications. I'd say people visiting the Sony and Nokia sites is theoretically more diverse than people visiting Wikipedia. We must make sure that the sampling is not biased towards any single group of people, whether it's US people, or tech-savvy people, or academics and professionals.
The only single source AFAIK that can have any hope of representing not just different nationalities, but all kinds of demographics of people reasonably well, is Google. And even Google's stats is still biased towards more tech-savvy and professional people in areas like China.
Posted by: kaxin001 | August 23, 2009 5:20 PM
Yes, Wikipedia is definitely not a good source for browser share information. I am pretty sure that Wikipedia will see a significantly lower IE share than what we get from other sources - it has a very significant share of power users and academics among their visitors, not so much casual Internet users who make up for most of the overall traffic. Note that the casual Internet users often don't know about Wikipedia - and they rarely use search engines, so Wikipedia's high search rank doesn't get them there either. Also, despite the availability of many languages, Wikipedia's English version is by far the most complete one. Many of the other languages didn't manage to get significant traction - while they get some hits, the sample size is an issue here again.
Posted by: Wladimir Palant | August 24, 2009 12:14 AM
PS: Good that I usually read that blog in Thunderbird. Asa, the font you chose (Constantia) is rather hard to read on screen - at least that's what I have on Windows Vista. How about something that isn't bold in its "normal" version already?
Posted by: Wladimir Palant | August 24, 2009 12:21 AM
Some Wikipedia editors use tools such as AWB (http://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser) rather than just a webbrowser, so that may skew the stats a little.
Posted by: Peter | August 24, 2009 1:36 AM
You don't seem to mention that the reported rise of Netscape in China looks to be directly related to the reported decline of IE. I don't know what this means though.
Perhaps as you suggested, although China has the largest web population, not many sites there use these analytics and therefore a small number of sites can greatly affect the overall results. Or some large-scale spider used to report itself as IE, but now Netscape.
Posted by: Dan | August 24, 2009 3:09 AM
Posted by: Ms2ger | August 24, 2009 3:14 AM
I was actually looking for browser-stats from Google a week or so ago - but found that Google no longer relases this info. I was kinda sad, because as others have stated here, Google is globally represented, and doesn't reflect a single user demographic.
I would like to see an aggreagation of stats, from Net Applications, Statcounter, Google, Wikipedia, Yandex, Baidu, and perhaps a few more global sites - for instance Sony and Nokia - and maybe the biggest cell-phone makers in Asia too. These pages are visited by "most people" looking for information and software etc.
Of course, the statistics would still be a bit off - most "run-of-the-mill" users aren't very web-literate - the click on links, visit the same pages every day, and perhaps, if they're looking for something, asks someone more tech-savvy to look it up for them, and maybe send them a link.
I know that my parents, and many non-nerdy people of younger age is like that. These people are the people that uses the browser delivered with the system. That means IE in some incarnation for Windows' users, and Safari for Mac-users. Linux has such a marginal part of these people's web-usage, it's not really even any point in including it in this generalisation.
Do these people use Google and Wikipedia? More often no than yes. Do they visit large cellphone, audio/video-websites? Not really. These users are hard to actually measure unless one bases it on stats from pages where the counters are implemented - mostly because they don't "browse around".
Posted by: PoPSiCLe | August 24, 2009 5:52 AM
I really wouldn't expect Wikipedia to be heavily weighted toward academia. If anything, academia tends to look down on it, both for its crowdsourcing approach and for the fact that it tends to cover topics that ordinarily wouldn't be included in an encyclopedia.
There's an awful lot of pop culture articles on Wikipedia, and they show up very high in search results. Try searching for any actor or actress, and you'll usually get Wikipedia, IMDB, and *maybe* their official site in the first few results. That's visibility out in the general world.
I'd guess that if you looked just at people reading pages, not editing, you'd get a much more representative picture than mostly-academic users.
Posted by: Kelson | August 24, 2009 9:08 AM
An aggregate of the Google Analytics could also be a good source. The Google name has really helped to get a free counter on a lot of different sites.
Posted by: Lennie | September 5, 2009 12:11 AM
My point was, assuming google's traffic was an important source of estimating browser share, wikipedia might be a good replacement.
The main reasons?
1- Very large, broad set of properties.
2- Very large traffic, esp. since their site is increasingly the top link in google's search results.
3- They do not make a browser, or have any other clear motivation to hide their browser share statistics from the community.
Posted by: Ben Chuang | September 16, 2009 1:19 AM
Only if Wikipedia users are a good representation of the Internet population as a whole; and that would be hard to prove, wouldn't it?