Hacker Cup 2014 Qualification Round

I passed the Facebook Hacker Cup qualification round last week-end (problems). Unlike other rounds, there’s no real time pressure in the qualification round. I normally use Java for these competitions since the most productive IDE I have is JetBrains’ IntelliJ IDEA. However, since I’m trying to become more adept at the new idioms of C++11, I also tried that (using Xcode) for two of the problems.

The first problem involved identifying a square of ‘#’ characters in a field of ‘.’ characters. After some messy Java code that analyzed the grid as it was read in, my C++ effort read in the whole grid into memory first. The following C++ felt decent for finding the first and last rows not consisting of all ‘.’ characters:

string blank(n, '.');
auto isntBlank = [&blank](const string & row) { return row != blank; };
auto firstRow = find_if(grid.begin(), grid.end(), isntBlank);
auto lastRow = find_if(grid.rbegin(), grid.rend(), isntBlank);

However, I couldn’t subtract those variables to find the difference because the types are different (one is a forward iterator and the other is a reverse iterator). The base() method converts a reverse iterator to the underlying forward iterator, so the count is determined by

  lastRow.base() - firstRow

A few applications of similar find logic provided a solution that contains no explicit loops, which is a nice result.

I implemented the second problem dealing with basketball substitution patterns using a pair of priority queues for each team: one queue for the players on the court and another for the players on the bench. I don’t think there’s a clean way to allocate an array of generic collections in Java, and I ended up with the following for the bench of the two teams:

@SuppressWarnings("unchecked")
PriorityQueue[] bench = new PriorityQueue[2];
...
bench[t] = new PriorityQueue<>(n, new Comparator() {
   @Override
   public int compare(Student a, Student b) {
      if (a.minutes < b.minutes)
         return -1;
      if (a.minutes > b.minutes)
         return 1;
      if (a.draft < b.draft)
         return -1;
      if (a.draft > b.draft)
         return 1;
      return 0;
   }
});

C++ has a built-in pair<> template class that makes the comparison part more compact, but I never found a clean way to declare the array. For each type of queue, the players and the bench, requires a different comparison function, so it can’t be built-in to the object class itself. There’s a constructor to supply a separate comparison function for that case, but I couldn’t get the constructor argument to match the generic comparison type so I ended up specifying the type in the template argument list. It seems to violate the C++ DRY goal, but it’s not too bad using a typedef.

auto benchLess = [](Student const & a, Student const & b)
   { return make_pair(a.minutes, a.draft) > make_pair(b.minutes, b.draft);};
typedef priority_queue<Student, vector, decltype(benchLess)> BenchType;
BenchType bench[]{ BenchType(benchLess), BenchType(benchLess)};

The third problem involved a dynamic programming style probability calculation and contained the statement, “It is guaranteed that the output is unaffected by deviations as large as 10-8. ” I’m now not sure what that means. I took it to mean that conditional probabilities could be rounded to 10-8 and values less than 10-8 could be ignored in order to get the required 6 digits on accuracy. As an optimization I ignored values less than 10-10. However, that led to one of my answers having just less than the required accuracy, so there must be some other meaning to the statement.

Hoping to try round 1 next week.

 

Google Code Jam 2011 Round 2

My luck ran out in Round 2 of the Google Code Jam 2011. I placed 626th but needed to be in the top 500 to advance. At least I qualified for a T-shirt for being in the top 1000.

There were four problems this time, with less variation in difficulty than usual. I solved the first problem, Airport Walkways, completely and got partial credit for my correct-but-inefficient solutions to two other problems. Of course, I figured out how to make my solutions efficient shortly after the contest ended. I didn’t get to the fourth problem, A. I. War, but it looks more straightforward than I expected from a title alluding to artificial intelligence. Maybe I could have solved that one completely instead of one of the partial solutions.

The third problem, Expensive Dinner, involved finding the difference between the best and worst case for a particular problem involving least common multiples. I set about computing both cases separately and taking the difference. I thought I was doing well to reduce the naive O(n2) solution to O(n), but when n is 1012, that’s not good enough (for full credit). I reduced the best case to be the number of primes less than or equal to n, aka π(n), and the worst case to be the number of primes powers less than n. That seemed like a dead-end, though, since it’s not so easy to calculate π(1012). What I didn’t see in time is that since I was going to be taking the difference of the two and they both included π(n), I didn’t have to compute that part at all. And counting the higher prime powers is much easier, and that’s all I really needed to do. Oh well.

The second problem, Spinning Blade, didn’t involve any fancy math, but required a clever data structure to reduce it from O(n4) to O(n2) or so. (n was limited to 500, so O(n2) is fine.) I didn’t find it in time, though, and went with my brute force solution for the partial credit.

Fast Factoring for 64-bit Integers

Some of the Project Euler problems involve factoring numbers which are either large or small depending on your perspective. Problems are generally limited to 64-bit integers (about 18 digits) which are big numbers for most of us, but in the field of factorization those numbers are terribly small compared to the 100+ digit numbers security protocols deal with. Most advanced methods deal with optimizing the factoring of those huge numbers and don’t mind significant amount of overhead, but I want to know what’s fastest for 64-bit integers.

To find out, I ran some tests on some variations on three basic, low-overhead methods: Trial Division, Fermat’s method, and Pollard’s Rho method. All of these take a long time if the number being factored is actually prime, so it’s worthwhile to add in a fourth component which is a Miller-Rabin primality check. Here are my timing results for 400,000 random 64-bit integers. Actually, only the first test uses 400,000 numbers since I limited each test to 1 hour and extrapolated beyond that.

Seconds Method
811 Rho + Trial Division + MR
6359 Fermat + Trial Division + MR
6393 Trial Division + MR after each factor found
29397 Trial Division +MR at start
71195 Trial Division without MR

I was really surprised at how well the Rho method worked in practice. It’s a probabilistic method that’s basically like trial division except it chooses numbers at random instead of sequentially. However, the “random” generator uses a polynomial such that lots of the values can be tested at once using some fancy number theory.

Fermat’s Method works best when there are two divisors near √n, which apparently doesn’t happen very often. Here is my Rho code, which is adapted from some pseudocode in a gamedev forum thread.

[sourcecode language=’java’] long rhoFactor(long n, int c, int max) {
int check = 10;
long x1 = 2;
long x2 = 4 + c;
long range = 1;
long product = 1;
int terms = 0;

for (int i = 0; i < max; i++) { for (int j = 0; j < range; j++) { x2 = (squareMod(x2, n) + c) % n; long next = multiplyMod(product, Math.abs(x1 - x2), n); if (next == 0) { return Math.max(gcd(product, n), gcd(Math.abs(x1 - x2), n)); } else if (++terms == check || j == range - 1) { long g = gcd(next, n); if (g > 1)
return g;
product = 1;
terms = 0;
}
else
product = next;
}

x1 = x2;
range *= 2;
for (int j = 0; j < range; j++) { x2 = (squareMod(x2, n) + c) % n; } } return 1; } [/sourcecode] For the parameters, I used small odd numbers for c, the polynomial constant term, and 16 - 20 for max which limits the generated values at around 2^max. If the factorization fails, I increase c by 2 and try again. For max = 16, it failed to find a factor about once for every 10,000 numbers and never failed twice in my tests. And those numbers had already had any small factors (less than about 50,000) removed with trial division.

Updated Matrix Multiply Times

I finally got around to running my matrix multiply microbenchmarks on Windows as well as Mac. Here’s the summary of times on Mac OS X and Windows XP on the same Mac Book Pro.

200×200 Matrix Multiply times in milliseconds
Mac OS X Windows XP
Java 1.5 C Java 1.6 C
ms ms ms ms
22 11.2 12.2 12.7

(Though not a big table, I’m using it to test out support for HTML 4 (ca. 1999) table elements like THEAD.)

Microbenchmark on Java and C Matrix Multiplication

My Java bug entry on faster numerics has been summarily closed as unreproducible. There is a short evaluation showing Java to be faster that C at matrix multiplication, but the comparison misses the mark because:

  • The C code (not shown) appears to be using nested arrays for matrix storage, which I have never seen in practice for C matrix libraries.
  • The Java code is run under the server version of Java, which is fine for a small benchmark but usually not practical for a large (client) app, such as many existing C++ numerics apps are.

Nonetheless, the activity prompted me to rerun my matrix benchmarks. Actually, I had to recreate them since I couldn’t find my Java 1.3 era versions that showed a 15x disadvantage for Java. This time Java fared much better. The tests (C++ Code Java Code) just multiply the same 200×200 matrices over and over with some add operations mixed in for good measure. I tried to make the code fast without working too hard. For instance, I didn’t get GCC to do vectorization, and I didn’t control my environment too well.

I tried two variations of the Java matrix representation and two variations of the Java multiplication code for four total Java variations, of which I took the fastest. The representation variation was whether to use a single array and multiplied indexing or to use nested arrays. The multiply variation was whether to transpose the B matrix or not — transposing it lets the inner loop traverse a one sub-array each from A and B. Arrays of arrays with transposing worked best on my Intel Mac, but flat arrays without transposing was best on the PowerPC Mac. I think the Intel JVM is better about the range check elimination optimization, so it favors the extra arrays with simpler indexing.


2GHz PowerPC Mac

Java 1.5 56.7ms
C++/GCC 26.2 ms


2.33GHz Intel MacBook

Java 1.5/client 32.5 ms
Java 1.5/server 12.6 ms
C++/GCC 11.0 ms

On the PowerPC Mac, there was no difference between client and server Java.

So client Java 1.5 is only 3x of C++ and server Java 1.5 is almost the same as C++. Not bad, but not quite what Sun’s evaluation showed. Still need to try Java 6 and turning on GCC vector processing.

Prime Benchmarks Updated

After getting a new MacBook Pro and installing Boot Camp on it so I could boot into Windows or Mac OS X, I’ve rerun some of my prime number mini-benchmarks. I only ran the tests in C++ and latest Java on each OS, skipping C# this time. The charts below summarize the results. Times are in seconds and shorter bars are better. (CSV Data)

Java vs. C++ Prime Finding Benchmarks

The first chart puts the C++ and Java times side by side for each host. The Linked List allocates tons of small objects (one for each prime number found), and the C++ allocator seemed to have significantly improved to be twice as fast as Java and C++ on G5, but otherwise there is not too much difference.

G5 vs Intel Prime Finding Benchmarks

The second chart shows the same data as the first but grouped for comparison of OS/CPU combinations. The Intel CPU (Core 2 Duo @ 2.33 GHz) is noticeably faster than the G5 (2 GHz), but there isn’t a real difference between OSs.

I used latest public versions of Java, which were Java 5 for Mac OS X and Java 6 for Windows XP.

Java 7 Wish List Item: Static Analysis Optimizations

Shortly after Java 6 came out last year, there were a number of Java 7 wish lists floating around (example). Most wishes are for syntax improvements. I don’t use Java as much as I used to, and I only have one wish list item: static analysis optimizations. That is, optimizations made possible with the help of static (not run-time) optimizations.

I’ve written before on the promising range check elimination optimization which strives to eliminate array bounds checking when provably unnecessary. But that optimization doesn’t quite live up to its potential because any nontrivial array usage is beyond the scope of a run-time proof. So the idea of static analysis optimization is for a compile-time (or post-compile-time) process to do an in-depth analysis of array usage and other potential optimizations and insert the suggested optimization and finished proof into the byte-code. Then the JVM only needs to verify the proof at run-time rather that construct a proof from scratch. And for that matter, the JVM could never generate a proof, since it could assume that if there is no proof provided already, there’s no use looking for one at run-time.

My motivation is scientific computing (especially linear algebra), which still seems to be unnecessarily handicapped in Java.