Archive for the ‘Java’ Category

Fast Factoring for 64-bit Integers

Sunday, October 19th, 2008

Some of the Project Euler problems involve factoring numbers which are either large or small depending on your perspective. Problems are generally limited to 64-bit integers (about 18 digits) which are big numbers for most of us, but in the field of factorization those numbers are terribly small compared to the 100+ digit numbers security protocols deal with. Most advanced methods deal with optimizing the factoring of those huge numbers and don’t mind significant amount of overhead, but I want to know what’s fastest for 64-bit integers.

To find out, I ran some tests on some variations on three basic, low-overhead methods: Trial Division, Fermat’s method, and Pollard’s Rho method. All of these take a long time if the number being factored is actually prime, so it’s worthwhile to add in a fourth component which is a Miller-Rabin primality check. Here are my timing results for 400,000 random 64-bit integers. Actually, only the first test uses 400,000 numbers since I limited each test to 1 hour and extrapolated beyond that.

Seconds Method
811 Rho + Trial Division + MR
6359 Fermat + Trial Division + MR
6393 Trial Division + MR after each factor found
29397 Trial Division +MR at start
71195 Trial Division without MR

I was really surprised at how well the Rho method worked in practice. It’s a probabilistic method that’s basically like trial division except it chooses numbers at random instead of sequentially. However, the “random” generator uses a polynomial such that lots of the values can be tested at once using some fancy number theory.

Fermat’s Method works best when there are two divisors near √n, which apparently doesn’t happen very often. Here is my Rho code, which is adapted from some pseudocode in a gamedev forum thread.

   long rhoFactor(long n, int c, int max) {
        int check = 10;
        long x1 = 2;
        long x2 = 4 + c;
        long range = 1;
        long product = 1;
        int terms = 0;

        for (int i = 0; i < max; i++) {
            for (int j = 0; j < range; j++) {
                x2 = (squareMod(x2, n) + c) % n;
                long next = multiplyMod(product, Math.abs(x1 - x2), n);
                if (next == 0) {
                    return Math.max(gcd(product, n), gcd(Math.abs(x1 - x2), n));
                }
                else if (++terms == check || j == range - 1) {
                    long g = gcd(next, n);
                    if (g > 1)
                        return g;
                    product = 1;
                    terms = 0;
                }
                else
                    product = next;
            }

            x1 = x2;
            range *= 2;
            for (int j = 0; j < range; j++) {
                x2 = (squareMod(x2, n) + c) % n;
            }
        }
        return 1;
    }

For the parameters, I used small odd numbers for c, the polynomial constant term, and 16 – 20 for max which limits the generated values at around 2^max. If the factorization fails, I increase c by 2 and try again. For max = 16, it failed to find a factor about once for every 10,000 numbers and never failed twice in my tests. And those numbers had already had any small factors (less than about 50,000) removed with trial division.

Updated Matrix Multiply Times

Monday, October 1st, 2007

I finally got around to running my matrix multiply microbenchmarks on Windows as well as Mac. Here’s the summary of times on Mac OS X and Windows XP on the same Mac Book Pro.

200×200 Matrix Multiply times in milliseconds
Mac OS X Windows XP
Java 1.5 C Java 1.6 C
ms ms ms ms
22 11.2 12.2 12.7

(Though not a big table, I’m using it to test out support for HTML 4 (ca. 1999) table elements like THEAD.)

Microbenchmark on Java and C Matrix Multiplication

Thursday, June 21st, 2007

My Java bug entry on faster numerics has been summarily closed as unreproducible. There is a short evaluation showing Java to be faster that C at matrix multiplication, but the comparison misses the mark because:

  • The C code (not shown) appears to be using nested arrays for matrix storage, which I have never seen in practice for C matrix libraries.
  • The Java code is run under the server version of Java, which is fine for a small benchmark but usually not practical for a large (client) app, such as many existing C++ numerics apps are.

Nonetheless, the activity prompted me to rerun my matrix benchmarks. Actually, I had to recreate them since I couldn’t find my Java 1.3 era versions that showed a 15x disadvantage for Java. This time Java fared much better. The tests (C++ Code Java Code) just multiply the same 200×200 matrices over and over with some add operations mixed in for good measure. I tried to make the code fast without working too hard. For instance, I didn’t get GCC to do vectorization, and I didn’t control my environment too well.

I tried two variations of the Java matrix representation and two variations of the Java multiplication code for four total Java variations, of which I took the fastest. The representation variation was whether to use a single array and multiplied indexing or to use nested arrays. The multiply variation was whether to transpose the B matrix or not — transposing it lets the inner loop traverse a one sub-array each from A and B. Arrays of arrays with transposing worked best on my Intel Mac, but flat arrays without transposing was best on the PowerPC Mac. I think the Intel JVM is better about the range check elimination optimization, so it favors the extra arrays with simpler indexing.


2GHz PowerPC Mac

Java 1.5 56.7ms
C++/GCC 26.2 ms


2.33GHz Intel MacBook

Java 1.5/client 32.5 ms
Java 1.5/server 12.6 ms
C++/GCC 11.0 ms

On the PowerPC Mac, there was no difference between client and server Java.

So client Java 1.5 is only 3x of C++ and server Java 1.5 is almost the same as C++. Not bad, but not quite what Sun’s evaluation showed. Still need to try Java 6 and turning on GCC vector processing.

Prime Benchmarks Updated

Tuesday, April 3rd, 2007

After getting a new MacBook Pro and installing Boot Camp on it so I could boot into Windows or Mac OS X, I’ve rerun some of my prime number mini-benchmarks. I only ran the tests in C++ and latest Java on each OS, skipping C# this time. The charts below summarize the results. Times are in seconds and shorter bars are better. (CSV Data)

Java vs. C++ Prime Finding Benchmarks

The first chart puts the C++ and Java times side by side for each host. The Linked List allocates tons of small objects (one for each prime number found), and the C++ allocator seemed to have significantly improved to be twice as fast as Java and C++ on G5, but otherwise there is not too much difference.

G5 vs Intel Prime Finding Benchmarks

The second chart shows the same data as the first but grouped for comparison of OS/CPU combinations. The Intel CPU (Core 2 Duo @ 2.33 GHz) is noticeably faster than the G5 (2 GHz), but there isn’t a real difference between OSs.

I used latest public versions of Java, which were Java 5 for Mac OS X and Java 6 for Windows XP.

Java 7 Wish List Item: Static Analysis Optimizations

Monday, January 8th, 2007

Shortly after Java 6 came out last year, there were a number of Java 7 wish lists floating around (example). Most wishes are for syntax improvements. I don’t use Java as much as I used to, and I only have one wish list item: static analysis optimizations. That is, optimizations made possible with the help of static (not run-time) optimizations.

I’ve written before on the promising range check elimination optimization which strives to eliminate array bounds checking when provably unnecessary. But that optimization doesn’t quite live up to its potential because any nontrivial array usage is beyond the scope of a run-time proof. So the idea of static analysis optimization is for a compile-time (or post-compile-time) process to do an in-depth analysis of array usage and other potential optimizations and insert the suggested optimization and finished proof into the byte-code. Then the JVM only needs to verify the proof at run-time rather that construct a proof from scratch. And for that matter, the JVM could never generate a proof, since it could assume that if there is no proof provided already, there’s no use looking for one at run-time.

My motivation is scientific computing (especially linear algebra), which still seems to be unnecessarily handicapped in Java.

Sudokit Available

Sunday, June 18th, 2006

I’ve finally gotten the Sudokit code together enough to make it available. You can download the zipped source and an executable jar file. The app and the code are both still ugly, but I’ve added a text entry field to make it halfway usable.

There is no documentation. The app is just an experimental solver, trying to mimic human solving techniques and providing a rating for a puzzle’s difficultly. Several puzzles are included to pick from or you can enter your own. The solver has some advanced techniques, and while there are some test puzzles in the “database” that it can’t solve, all newpaper-level puzzles can be solved with only the simplest techniques.

That was the thesis from my original investigation: that if applied methodically, only simple techniques are required to solve newspaper-level problems. The simple techniques being hidden single and naked single. Knowing that, I thought I would test the thesis on a “hard” puzzle from a recent Saturday paper. I couldn’t solve it by hand with the simple methods and had to use some of the subset-based techniques.

Wanting to try that puzzle with Sudokit was enough to get me to add the text entry field (and the log panel at the bottom). Sure enough Sudokit was able to solve it with just the simple techniques. Either I wasn’t methodical enough or Sudokit’s logic is implicitly using more advanced techniques.

JDBC Providers Release

Saturday, April 29th, 2006

I’ve put together a release of the JDBC Providers at berlios. This packages up recent bug fixes and the separation of Java and SQL code which makes it easier to support other DB vendors.

Meanwhile Søren and Mikkel are working on a branch to futher separate the DB by using JNDI so we can rely on the servlet container to maintain the DB connections.