Three Monkeys, Three Typewriters, Two Days

February 17, 2010

What most annoys me about "old media"

Today I saw a link to a CNN article that pretty much sums up what I get annoyed at most with most of the "old media" sites. The article talks about "a report released Wednesday", gives some of the highlights, summarizes some stuff, but doesn't answer the questions I have (like "which states were not having budget issues?"). In particular it doesn't link to the actual report it's talking about. Nor does it feel the need to explain why it's not doing so (NDA? Report is paywalled? They only read a press release and not the actual report? Something else?). At this point, that automatically reduces the credibility of both the article and the report in my eyes.

I did take a look at the 2008 reports from the people cited in the CNN article but none of them look obviously relevant. The only one I see about state revenues doesn't really say what the CNN article does.

It's odd that I now fully expect good news coverage to provide links to something resembling primary sources. It's amazing to me that such an expectation is not always disappointed. At least as long as the news coverage doesn't come from things like CNN (and BBC and others; none of them really cite their sources very well at all).

Posted by bzbarsky at 1:51 AM

February 14, 2010

The pitfalls of comparing performance across browsers

From the source of a popular open-source rendering engine:

  // Flush out layout so it's up-to-date by the time onload is called.

From the source of another popular open-source rendering engine:

  // Make sure both the initial layout and reflow happen after the onload
  // fires. This will improve onload scores, and other browsers do it.
  // If they wanna cheat, we can too.

So if you're doing any sort of performance timing using onload and you're not flushing out layout yourself in a <script> at the very bottom of the page (because in some browsers loads of some subresources will actually start during layout) as well as right before taking your "load stop" timestamp (because there might be more pending layout at that point), and doing this in all subframes of your document, then you're comparing apples to oranges.

Too bad this is a common thing for people to do.

Posted by bzbarsky at 10:29 PM

February 10, 2010

Unemployment statistics

There's a New York Times editorial that's been making the rounds that talks about a study showing that unemployment rates are quite different across the income spectrum. One first reaction on reading it that I encountered was: "Most people can't make 150k while not working for an entire quarter of the year, no?"

Today the study has actually been made public (nothing like giving the data to the press before publishing it publicly!). It looks like the basic methodology was to set up income decile boundaries based on 2008 data, then change them around somewhat in a completely opaque way not explained in the paper (it's not just inflation adjustment or something; some of the boundaries move by 2% and some move by 8%), then take people's self-reported income from a 2009 survey (for which year? doesn't say) to figure out which bin they go in.

The results have at least two interesting things going on:

  1. Since the deciles are household income, the number of workers actually varies by decile. The paper footnotes this, but gives no hint as to what the actual distribution of workers across these bins might be. In general, the most significant effect from this would be that a two-income household has a higher household income and places two workers into that higher decile, whereas a one-income household would typically place one worker into a lower decile.
  2. The obvious "if you're unemployed or not steadily employed you end up with a low household income" causation certainly accounts for some of the observed correlation, whether the incomes being reported are 2009 ones (in which case they would certainly be affected by being unemployed in Q42009) or 2008 ones (in which case the low incomes likely correlate somewhat with lack of steady employment in 2008, which one would expect to correlate well with a continuing lack of such in 2009).

Unfortunately, there's not much that can be done better here without asking the Q42009 unemployed specific questions about what their income was before they became unemployed and prorating that to the full year to figure out where to bin them. I'm fairly sure some of the observed disparity would remain; what I don't know how much. I see no indication that such a survey+prorate operation is what was performed here, unless the "categorical form" jargon on page 6 of the paper refers to something like that. Anyone know whether it does?

Posted by bzbarsky at 2:43 PM

February 4, 2010

Understanding the numbers your profiler gives you

A common situation I run into is that I have some testcase, I profile it, then I optimize one of the things that looks slow a bit and reprofile. If I'm lucky, then the fraction of time it takes drops from B to A (A < B of course, with B standing for "before" and A for "after"). What does that mean for the overall time of the testcase?

Obviously, what it means depends on both A and B. If we assume that the time taken for the part that wasn't optimized hasn't changed, then the overall speedup is (1-A)/(1-B).

So B - A = 5% would mean a 2x speedup if B is 95% and A is 90%, and only a 1.05x speedup or so if B is 5% and A is 0%. If B is something like 53% and A something like 48%, then you get a 1.11x speedup.

All of which is to say that focusing on the hotspots is _really_ the way to go, if at all possible.

Posted by bzbarsky at 3:21 PM