Recently, Net Applications updated their methodology to weight their browser share by the Internet population of the countries they monitor. This had the effect of knocking Apple's Safari down quite a bit, presumably due to Apple's weak numbers outside of the U.S. and of lifting Opera up some, presumably due to Opera's stronger showing in Europe an Asia.
As of August 1st, we have implemented retroactive country-level weighting in our reports. This means that we adjust our reports proportionally based on how much traffic we record from a country vs. how many internet users that country has. For example, although we have significant data from China, it is relatively small compared to the number of internet users in China. Therefore, we now weight Chinese traffic proportionally higher in our global reports. This change produces a much more accurate view of worldwide usage share statistics.
That sounds like a change for the better, but it also seems to have had some unfortunate side effects. Net Applications obviously does not have equally good sampling in all of the countries it monitors. Where that sample is weak and likely to produce unrepresentative results, the effect can be either magnified or diminished depending on the Internet population of that country or region.
Before the new methodology, presumably, they just reported raw data -- their weighting was tied directly to the strength (size) of their sample. So if they had a relatively small (not likely to be very representative) sample in a particular country, that less good sample had less of an impact on their overall numbers.
Now, though, crazier and unrepresentative numbers in large Internet population countries have a quite dramatic effect on the overall global share reported by Net Applications.
Here's a good example:
Several weeks ago, I saw an odd spike in the Netscape usage share. Netscape hasn't shipped a new browser in ages and their global share has been pretty steady at around half a percentage point for as long as I can remember. Then, for the week of 07/19, their share shot up to over 1%. A doubling of share seemed a bit odd for a browser that's been out of circulation as long as Netscape but when you're dealing with half a percent or less, it's not unreasonable to imagine that it wasn't growth of Netscape so much as a slow week of usage for all the other browsers. I could sort of picture a situation where modern browser users as a cohort all did a bit less browsing for a few days while ancient browser users were unaffected. I didn't think too much about it but it did catch my attention.
Well, this week's numbers just came out and Netscape is showing a global share of almost 4% !!!
Here's what the trends look like for the last couple of months of global share.

My first thought was "this can't be right" so I looked at the U.S. share (subscribers only) and it looked stable and steady with Netscape well under 0.05% for years. So I turned next to share by continent where Asia showed a big Netscape spike. I drilled down a bit further and looked at just China browser share.
Here's what the trends look like for the last couple of months of China share.

So what's going on here? Well, it could be one of a couple of things. First, it could be some kind of spider that identifies itself as "Netscape 6.0" that's crawling the Chinese Web for search engine indexing or something like that. That's something Net Applications could dig into and if it is a spider, just add it to their list of not-counted hits. All competent stats packages can exclude that kind of traffic from their metrics.
Second, and potentially more problematic, Net Applications' sample in China, those Websites that have deployed the Net Applications site analytics package, could be just so few that it only takes a trivial number of site visitors switching browsers to have a very large impact on their measurements.
Either way, this kind of error now has a much larger impact when it happens in a country like China which happens to have the largest Internet using population of any country in the world.
I don't think there's really any good solution to dealing with small sample sizes and any commercial analytics package is going to suffer from that problem. Perhaps a second weighting based on sample size would help in reflecting more accurately the actual data, but that doesn't help us understand Internet populations any better.
I blogged a few months ago about similarly disturbing spikes in IE 6 usage in the metrics reported by StatCounter. I can only conclude that these providers simply don't have a good enough sample to describe global internet populations.

What we really need is measurements from organizations that have much more representative usage and there are only a few that I can think of. That cold come in three forms, as I see it. One, we could find the top measures for every locale and build ourselves a global picture from the bottom up. Two, we could look to a few heavy-weights for large regions (Google would obviously be really good for much of the planet, and combined with local powerhouses like Yandex in Russia and Baidu in China, we could probably get a pretty good global measure.) Or three, we could find one source that had solid global representation.
Ben Chuang, in my previous post on this topic had this to say:
I would also suggest that we don't need data from the absolute largest site in the universe, we just need data from a very-large-site-that-is-very representative. What about a more "open" site, like Wikipedia?
I think that's actually a really good suggestion. Wikipedia has articles in more than 250 languages and is regarded as the online encyclopedic authority by most nationalities. Wikipedia also has a huge amount of traffic, billions of visits every month, so it's not as likely to be swayed by the occasional odd visitor patterns.
So what do you all think? Would Wikipedia's browser breakdown be a better measure than the various analytics providers that we've all been using for the last five or six years?