How big is Nymphenburg Park? Nymphenburg Palace in Munich was the featured picture at Wikipedia recently, but what caught my eye was the area values given for the palace park in the caption: “200-hectare (494-acre) park.” That looks like a case where a unit conversion introduces false precision. A round number like 200 suggests low precision, so 200 hectares might mean anything from 150 to 250 hectares, but 494 suggests higher precision, between 493.5 and 494.5 acres in this case.

It looks like someone took a round number (200) and multiplied it by the hectare-to-acre conversion factor (1.47) to get a precise number (494). It would be better to go back to the original precise number of hectares, convert that to acres, and then round to the desired level of detail.

Trying to find the actual size of the park was more difficult than I expected. After finding a few places that listed the size as 200 acres, it was evident a different kind of error was also occurring, but it wasn’t clear whether hectares or acres was correct. Google hit count comparisons didn’t help. Searching for Nymphenburg Palace park “200 acre” gave 125 hits while Nymphenburg Palace park “200 hectare” gave 101 hits.

Just to be sure, I found the park on Google Maps and measured it myself with an online planimeter tool. The area of my rough polygon was 225 hectares, so that settles the 200 acre versus 200 hectare issue for me.

Eventually, I found the German language Wikipedia page for the Nymphenburg park, and it provided two areas, 180 and 229 hectares, with apparent authority. Translation:

The park inside the garden wall has a size of 180 hectares, the area of the entire facility is 229 hectares.

200 hectares could represent either 180 or 229. Exactly 180 hectares is 445 acres and 229 hectares is 565 acres, so you need to know where the 200 came from in order to know how to represent it in acres. It could be correct to say 400, 500, or 600 acres.

Citysearch Math Ever notice that everything on Citysearch has a good rating? In scanning a few dozen ratings I have yet to find anything lower than 3.5 stars or a correlation with the rating and the review ratings. This Glass Doctor summary is typical. The average of the reviews is 2.17, but it gets 3.5 stars. Maybe there’s some hidden internal review with a heavier weight, but I haven’t been able to find any explanation at the site. All I can find is another confused user. I thought maybe the low participation in this area was revealing some seed ratings from Citysearch, but even places with many ratings exhibit the strange math. Here’s a restaurant in Atlanta with 70 reviews averaging 3.86 but getting a full 5 stars.

The ratings don’t always go up — I found a couple of places with a one or two 5-star reviews but with 4 star ratings.

Fast Factoring for 64-bit Integers

Some of the Project Euler problems involve factoring numbers which are either large or small depending on your perspective. Problems are generally limited to 64-bit integers (about 18 digits) which are big numbers for most of us, but in the field of factorization those numbers are terribly small compared to the 100+ digit numbers security protocols deal with. Most advanced methods deal with optimizing the factoring of those huge numbers and don’t mind significant amount of overhead, but I want to know what’s fastest for 64-bit integers.

To find out, I ran some tests on some variations on three basic, low-overhead methods: Trial Division, Fermat’s method, and Pollard’s Rho method. All of these take a long time if the number being factored is actually prime, so it’s worthwhile to add in a fourth component which is a Miller-Rabin primality check. Here are my timing results for 400,000 random 64-bit integers. Actually, only the first test uses 400,000 numbers since I limited each test to 1 hour and extrapolated beyond that.

Seconds Method
811 Rho + Trial Division + MR
6359 Fermat + Trial Division + MR
6393 Trial Division + MR after each factor found
29397 Trial Division +MR at start
71195 Trial Division without MR

I was really surprised at how well the Rho method worked in practice. It’s a probabilistic method that’s basically like trial division except it chooses numbers at random instead of sequentially. However, the “random” generator uses a polynomial such that lots of the values can be tested at once using some fancy number theory.

Fermat’s Method works best when there are two divisors near √n, which apparently doesn’t happen very often. Here is my Rho code, which is adapted from some pseudocode in a gamedev forum thread.

[sourcecode language=’java’] long rhoFactor(long n, int c, int max) {
int check = 10;
long x1 = 2;
long x2 = 4 + c;
long range = 1;
long product = 1;
int terms = 0;

for (int i = 0; i < max; i++) { for (int j = 0; j < range; j++) { x2 = (squareMod(x2, n) + c) % n; long next = multiplyMod(product, Math.abs(x1 - x2), n); if (next == 0) { return Math.max(gcd(product, n), gcd(Math.abs(x1 - x2), n)); } else if (++terms == check || j == range - 1) { long g = gcd(next, n); if (g > 1)
return g;
product = 1;
terms = 0;
}
else
product = next;
}

x1 = x2;
range *= 2;
for (int j = 0; j < range; j++) { x2 = (squareMod(x2, n) + c) % n; } } return 1; } [/sourcecode] For the parameters, I used small odd numbers for c, the polynomial constant term, and 16 - 20 for max which limits the generated values at around 2^max. If the factorization fails, I increase c by 2 and try again. For max = 16, it failed to find a factor about once for every 10,000 numbers and never failed twice in my tests. And those numbers had already had any small factors (less than about 50,000) removed with trial division.

Project Euler Tester

After fanatically solving 160+ Project Euler math/programming problems whenever the came out, I’ve accepted an invitation to be part of the testing team. I’ve always thought the problems were very high quality regarding clarity and scale, and now I can confirm that those attributes are quite deliberate based on the discussions I’ve seen leading up to problem publication.

The “one minute rule” is well known at the site and is a requirement that any problem be solvable in under a minute of computation time on a midrange computer. Another rule is that integers larger than 64 bits not be required for the solution; some of my solutions have needed BigInteger, but I guess I missed some optimizations.

One downside to seeing the problems before they are published is that I can’t compete with the few testers who try to be the first to solve each problem as it comes out. That’s OK with me, and besides I don’t see all the problems ahead of time. There are two separate testing teams, so I can compete on some problems if the timing is right.

Science Blogging Conference in January

The 2008 Science Blogging Conference is coming up January 19. I attended last year and will be co-leading a session this year on “Public Scientific Data”. My selfish interest in public data is wanting to try to improve the visualizations I see in science papers, but I can’t readily do it without the data. The other discussion leader, Jean-Claude Bradley, is a real scientist, though, running a real open science chemistry lab at Drexel.

I started a skeleton outline of some of the issues at the conference wiki, and I’m happy to see others have expanded it.

Problem 156 Graphs

Problem 156 at Project Euler is one of the few without a specified limit. That is, most problems might ask for something like the sum of all solutions to an equation that are less than a billion. This one just asked for the sum of all solutions. When I first solved it, I just used the sum of solutions up to a trillion, which was good enough. Later I made some plots to help understand why there are no higher solutions.

Mild spoilers below if you’re thinking about trying the problem.

The problem is to find the solutions to f(x) = x where f(x) is the number of times a given digit, say 4, appears in all the numerals corresponding to the numbers from 1 to x. The equivalent problem is to find where f(x) – x = 0. The plots below show f(x) – x at different scales.

The first scale suggests f(x) – x for the whole range I looked at. The second scale shows where the function really takes off and why. For each 1e10 range there seems to be about as many 4s as numbers (the function keeps dipping to around 0) until it gets to 4e10 when each new number in the 4e10 to 5e10 range has at least one 4 and sometimes more, so the function really takes off, especially at 4.4e10. Zooming in more shows a fractal-like nature to the function. The varying density of the points is because my implementation makes fewer evaluations when further away from the origin. Heat Map Valentine This is a heat map graph of a mathematically generated data set with a little noise added for artistic effect. Next year I’ll work on anti-aliasing the edges.