by Xan Gregg, 2005-06-23
blog entry (has older comments)
Inspired by a previous effort at porting some C prime-finding code to Java, I’ve written a small benchmark that finds primes using three different techniques. The initial coding was in Java and then ported to C++ and C#. The three techniques for finding primes:
- Linked List
- finds primes one at a time by trial division by all the known primes less than the square root of the number. The found primes are kept in a linked list, so this test uses a lot of memory allocations. Minor optimization: skips multiples of 2 and 3.
- Bit Array
- uses a sieve to cross off multiples of primes from a large bit array, so that at the end only primes have their corresponding bits set. Minor optimization: only stores odd numbers in the array.
- Offset Range
- like BitArray but starts at a given offset instead of 0. Intended for finding primes a page at a time to save memory or parallelize the searching.
All the prime finders use the same interface class which uses 64-bit integers for all the primes, though only the Offset Range prime finder tests actually go beyond the range of 32-bit integers.
The raw sources are available for download. Sorry, no build files.
Below are timing results on a Mac 2x2GHz G5 running Mac OS X 10.4.1 and on a Dell with a 1.8GHz P4 running Window 2003. Times are generally comparable, except: C#/Mono is slow on the Mac, and the first test (which allocates lots of small objects) has poor results for CodeWarrior C++/Mac and Java/Windows.
C++ times depend more on compiler flags — I turned the optimizer up as far as I could for these. That meant changing the code a little to get around a GCC bug in which the code reported wrong numbers at higher levels of optimization. I also had to make a few changes to get the C++ code to get it to compile on Windows. Besides the line endings issue, the printf escape codes for printing 64-bit integers are apparently different.
Minor changes were also required for the C# code, since the exception class I used under Mono wasn’t recognized in .NET. And I was surprised that the C# exe file I created under Mono didn’t work on .NET (had to recompile).
So, though the benchmarks are intended to compare performance, it also highlights some issues with portability. No problems with Java portability.
Mac G5 Timing Results
| Test Name | Range | Prime Count | Java 1.4.2 | Java 1.5.0 | Mono 1.16 C# | CW 9.4 C++ | XCode 2.1 C++ |
|---|---|---|---|---|---|---|---|
| Linked List | 0…..10M | 664,579 | 11.87 | 11.37 | 43.07 | 39.57 | 10.64 |
| OddBitArray | 0…..10M | 664,579 | 0.17 | 0.19 | 0.90 | 0.13 | 0.17 |
| OddBitArray | 0….100M | 5,761,455 | 3.01 | 3.07 | 10.81 | 2.74 | 3.04 |
| OffsetRange | 10M….100M | 5,096,876 | 2.65 | 2.75 | 7.87 | 2.46 | 2.77 |
| OffsetRange | 1000M…1100M | 4,814,936 | 3.95 | 6.60 | 10.23 | 3.72 | 4.14 |
| OffsetRange | 10000M..10100M | 4,341,930 | 4.93 | 7.68 | 11.46 | 4.75 | 5.10 |
| total | 26.63 | 31.81 | 84.42 | 53.62 | 26.01 |
Windows P4 Timing Results
| Test Name | Range | Prime Count | Java 1.4.2 | Java 1.5.0 | .NET C# | MSDev C++ |
|---|---|---|---|---|---|---|
| Linked List | 0…..10M | 664,579 | 26.36 | 25.91 | 11.30 | 16.06 |
| OddBitArray | 0…..10M | 664,579 | 0.44 | 0.44 | 0.45 | 0.44 |
| OddBitArray | 0….100M | 5,761,455 | 6.70 | 6.86 | 6.76 | 6.92 |
| OffsetRange | 10M….100M | 5,096,876 | 6.24 | 6.12 | 5.86 | 6.09 |
| OffsetRange | 1000M…1100M | 4,814,936 | 8.09 | 8.24 | 7.86 | 8.16 |
| OffsetRange | 10000M..10100M | 4,341,930 | 9.28 | 9.38 | 9.01 | 9.33 |
| total | 57.14 | 56.99 | 41.24 | 47.23 |
Another interesting tidbit is that when I tried an early version of the benchmarks with 32-bit integers instead of 64-bit integers on my G5, the C times improved much more that the Java times. I’m guessing the Java JIT compiler was able to recognize that it was running on a 64-bit CPU while the C++ compiler had to use generic 64-bit arithmetic that would run on any PowerPC (no G5-only switch in the compilers I used).
Update 2007-04-03: A separate post contains Mac/Intel times.
Pingback: FORTH GO » Blog Archive » Find Primes Updated
Thanks for taking the time to design and share it with us!
What version of .net and MS C++ did you use?
When looking for benchmarks, I prefer to focus on the user experience rather than a micro-benchmark, like http://www.codeproject.com/KB/dotnet/RuntimePerformance.aspx
George, I didn’t record the .Net or MSVC versions, but I was using the current versions as of the time of writing (mid 2005).
Dude you are faking.. aint no way the benchmarks you got are true..
What linked list implementation did you use?
In C++ there are at least 1000, please show us the implemetation. STL may be slow .. so pure code comparison should be much better.
Not fair to compare built in type/features of C#/java with some bad implementation in C++. for the sole reason that built in types in Java/C# probably were written in C++…
So, bottom line, please show us exactly what code did you use…
There is a link to the source code in the document. You’ll see the linked list implementation is not STL and, besides, that was the one case that C++ did consistently better.