There was a nice long conversation today in #jsapi about presenting benchmark results, and the fact that any sort of "overall number" involves arbitrary weighting which can be tweaked to show just about anything as long as some browsers are faster on some things and others are faster on others. Even a good choice of initial weighting can quickly deteriorate as some things are optimized faster than others.
A proposal was made to present overall benchmark results as simply a per-test list of results, but provide a way to do a benchmark comparison that takes two result sets and produces a bar graph on which positive-height bars correspond to cases in which the first result is better and negative-height bars correspond to cases in which the second result is better. Or perhaps a horizontal bar graph with bars for tests on which the test set on the left wins going left and the others going right. Then one can look at where the general mass of the graph is to compare the two result sets.
That raises the obvious question of how the bar lengths should be computed. We pretty quickly decided they should be a function of the ratio a/b where a and b are the two time measurements. The first thing we considered was taking f(x) = log(x). This has the desirable property that f(x) = -f(1/x), so being 2x faster would give a bar of the same length as being 2x slower (but with the opposite sign). This has the obvious problem that large performance differences don't look so big, due to the behavior of log. At this point Jason pointed out that any odd function of log(x) has that property, and then of course the obvious thing to consider was a function that's exponential in the limits and odd: sinh(x). Conveniently, sinh(ln(x)) = (x^2-1)/2x. The factor of 2 is somewhat unfortunate because it makes f(x) not approach x in the limit, so we can just drop it.
Now we have f(x) = (x^2-1)/x = x - 1/x. This satisfies our f(x) = -f(1/x) condition, and is approximately equal to x for large x. Near x = 1, the second derivative is about -2, which is not too big. So near x = 1 the function is close enough to linear; the error in approximating f(1.1) as 2*f(1.05) is only 2% or so.
Now all I need is either time to code up this UI or a volunteer. Any takers? The goal is some JS blob that takes two lists of numbers (and probably the labels for the two lists) and creates a pretty graph.Posted by bzbarsky at March 31, 2010 4:23 PM