Range Check Elimination Optimization

by Xan Gregg, 2006-01-22
blog entry (for comments)

Java’s HotSpot JIT compiler has had the Range Check Elimination optimization since at least Java 1.3.1. Sun describes the optimization as

Range check elimination — The Java programming language specification requires array bounds checking to be performed with each array access. An index bounds check can be eliminated when the compiler can prove that an index used for an array access is within bounds.

Array bounds checking is often seen as a performance weak spot for Java compared to C++. Presumably, C++ developers spend time upfront in analysis and debugging to make sure a program does not try to access an array element that is out of bounds. In unoptimized Java, the check is done at run-time, every time an array element is accessed. The range check elimination optimization attempts to skip the bounds check when the compiler can be sure that the index is already in bounds. So it makes me wonder, what conditions allow the optimization to kick in and how much does it save anyway?

First aside: In the projects I have converted from C/C++ to Java (including GNU Chess), I have always run into an ArrayIndexOutOfBounds exception in the Java version that stemmed from an unnoticed error in the C/C++ source.

Second aside: Before I knew about this optimization, I posted a suggestion for it to Sun’s bug database, suspecting that bounds check is the reason Java performs poorly in numerical computing.

Simple Test

Ignoring the evils of microbenchmarking, I created a simple test (source) for measuring the optimization. The first thing I found out was that subscript checking is such a small cost that doing anything non-trivial would make any such optimization undetectable. That says something, but then again there are lots of linear algebra operations built from trivial array/matix operations.

This test consists of simple loops with the array index being the loop counter and spanning the length of the array. Essentially,

        double data[] = new double[N];
        for (int i = 0; i < N; i++)
            data[i] = 0;

Except I moved the data allocation out of the measured function, and I computed three kinds of index variables and used one in each subtest, called Easy, Medium, and Hard. Each actual loop looked more like:

        for (int i = 0; i < N; i++) {
            int j = i;                      // same as i
            int k = i / 2 + (i + 1) / 2;    // same as i
            data[?] = j - k;
        }

with the "?" being either i, j or k, depending on whether it was in the Easy, Medium, or Hard subtest, respectively. All the index variables are equal, and presumably the compiler can figure out that i and j are equal but not that i and k are equal. The j - k value (always 0) is to make sure those index variables don't get eliminated when they are otherwise unused.

In addition to the above initialization loop, there are two other loops with the following bodies:

          data[?] += 1.0 / ((m * N + j) * 2 + 1) * -((k % 2) * 2 - 1);

and

          sum += data[?] + k - j;

The formula looks a strange but I thought I might as well do something pseudo-meaningful and calculate π. m is the index of an outer loop.

Simple Test Results

I ran the benchmarks on Windows (dual 3GHz Pentium) and Mac OS X (dual 2GHz PowerPC) with Java 5, in both client and server modes. The HotSpot docs say the range check optimization only kicks in during server mode, though Apple has their own implementation of Java, so I was less confident about seeing the optimization there.

On Windows, the optimization kicked in for Easy and Medium but not for Hard and only in server mode, as documented.

x86; M×N=50M client server
Easy 1469 1000
Medium 1469 1000
Hard 1469 1235
Hard Hinted 1750 1109

Times are in milliseconds, with timer resolution at 16 milliseconds. Tests were run 5 times with the run 5 times reported above (run 1 was slower, and runs 2 – 5 showed no difference in timing). Assuming the Easy and Medium tests had the range check elimination optimization and the Hard didn’t, the optimization produced a 19% time savings. Hard Hinted is explained below.

On Mac OS X, I got some strange results. The first run was 25% faster than the other runs, and only the first run showed any difference among the tests (Easy and Medium being 10% faster than Hard). This was with an array size of 100,000 and the outer loop going 500 times. When I tried decreasing the array size to 10,000, I got results more in line with Windows. All of these reported test results (above and below) use an array size of 10,000 with 5000 outer loop iterations.

PowerPC; M×N=50M client server
Easy 2794 2803
Medium 2786 2805
Hard 3814 3852
Hard Hinted 3448 3462

So apparently the optimization is used for both client and server modes on Mac OS X and the speed improvement is a little bigger than on Windows, about 27%. A little further experimentation with the array size shows that the optimization is only applied when the index values are all less than 32768 (that is, small enough to qualify as a signed 16-bit value). At least that part of the mystery is solved; I still don’t know why the first run was faster that other runs for large arrays.

Fortunately, Windows Java and Mac OS X Java agree on the final estimate of π using 50,000,000 terms of the Leibnitz formula: 3.1415924717015926.

Hinting

The Hard Hinted lines above are the results of adding a variation to the Hard test. Just before the array access for data[k], I add a condition,

    if (k >= 0 && k < data.length)
        ...use data[k]...

I know that k is always in range, and the presence of the above condition is a way for the JIT to also know it will be within range for the array access. Surprisingly, the Hard Hinted test showed a noticeable speed up over the plain Hard test. I would have thought at best there would be no difference since the explicit condition check is equivalent to the implicit check performed at runtime. There must be something more to the implicit bounds check. The explicit check should be even more helpful when multiple array accesses are present inside the conditional statement.

Conclusions

The Range Check Elimination is real on Sun’s JVM in server mode and on Apple’s JVM in both client and server mode. While the JIT can only optimize simple array accesses on its own, it will notice hints from the developer to optimize more complex array accesses.

The speed-up, however, appears to be mild, and doesn’t account for my tests a while back showing Java matrix multiplication taking 10x the time as C. I guess I need to revisit that benchmark.

One Response to Range Check Elimination Optimization

  1. Pingback: FORTH GO » Blog Archive » Checking Range Check Elimination, Part 2

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>