How big is Nymphenburg Park?

Nymphenburg Palace

Nymphenburg Palace in Munich was the featured picture at Wikipedia recently, but what caught my eye was the area values given for the palace park in the caption: “200-hectare (494-acre) park.” That looks like a case where a unit conversion introduces false precision. A round number like 200 suggests low precision, so 200 hectares might mean anything from 150 to 250 hectares, but 494 suggests higher precision, between 493.5 and 494.5 acres in this case.

It looks like someone took a round number (200) and multiplied it by the hectare-to-acre conversion factor (1.47) to get a precise number (494). It would be better to go back to the original precise number of hectares, convert that to acres, and then round to the desired level of detail.

Trying to find the actual size of the park was more difficult than I expected. After finding a few places that listed the size as 200 acres, it was evident a different kind of error was also occurring, but it wasn’t clear whether hectares or acres was correct. Google hit count comparisons didn’t help. Searching for Nymphenburg Palace park “200 acre” gave 125 hits while Nymphenburg Palace park “200 hectare” gave 101 hits.

Just to be sure, I found the park on Google Maps and measured it myself with an online planimeter tool. The area of my rough polygon was 225 hectares, so that settles the 200 acre versus 200 hectare issue for me.

Planimeter tool on Nymphenburg Park

Planimeter tool on Nymphenburg Park

Eventually, I found the German language Wikipedia page for the Nymphenburg park, and it provided two areas, 180 and 229 hectares, with apparent authority. Translation:

The park inside the garden wall has a size of 180 hectares, the area of the entire facility is 229 hectares.

200 hectares could represent either 180 or 229. Exactly 180 hectares is 445 acres and 229 hectares is 565 acres, so you need to know where the 200 came from in order to know how to represent it in acres. It could be correct to say 400, 500, or 600 acres.

Posted in Math | Leave a comment

Burtin Antibiotic Illustrations

CHANCE magazine is running a contest to create the best illustration for a data set of the effectiveness of three antibiotics on sixteen strains of bacteria. Designer Will Burtin used this data set for a 1950s visualization.

With only five variables and sixteen observations, my first question is, “What’s wrong with just using a table?” The table in the contest description is even nicely laid out.
burtin-data

My second question is, “Best for whom?” Which illustration is best depends on the audience, which in this case might be doctors, researchers, statisticians or the general public among others.

The data shows Minimum Inhibitory Concentration (MIC, presumably in µg/ml) for each antibiotic and bacteria combination. Lower is better, indicating less antibiotic is needed to treat the bacteria. The MIC values vary widely from 0.001 to 1000, and I applied a logarithm transform for analysis, either on the data or on the graph. Besides nicely spreading out the data values, the log transformation may have a physical interpretation. If an antibiotic culture grows exponentially, then the log of the concentration is the time to grow it.

Exploring the data a little bit, the simplest visualization is a heat map, where every number is represented by a swatch of color. I don’t see much advantage over the table of numbers, except to quickly find extreme values or certain other patterns that the colors help with.

burtin-heat

Next, we might think from a researcher/statistician perspective and try to cluster the bacteria that react similarly to the antibiotics. Here’s a heat map and dendrogram resulting from a cluster analysis. The rows are colored by gram staining. It’s like the heat map above, but similar bacteria are grouped together (and the color scale is slightly different). The bacteria that are clustered close together might suggest a commonality for future research.

burtin-cluster

Since there are only three antibiotics, we can view the data as a 3D scatter plot. Here, the data markers correspond to the clusters.

burtin-3d-1

3D doesn’t work too well in static 2D media like this one since you need to be able to rotate it to see the structure. If you do rotate it, you can see that three of the clusters appear roughly in a straight line, so maybe there are really two different kinds instead of four. Here’s a view looking straight down the line.

burtin-3d-2

A scatter plot matrix shows all the 2D relationships better and is better for static presentation. It can’t show the 3D the alignment of the three clusters, but you can get a hint of it in the neomycin versus penicillin panel.

burtin-scm

For my contest entry, I decided to go with the perspective of a 1950s doctor, with the idea that a doctor treating a patient doesn’t know usually know what bacteria is causing the infection and may or may not have the results of a gram staining. With that in mind, my visualization shows the MIC for each antibiotic with the best dose for each scenario called out.

burtin-gb

The graph shows that penicillin is best for gram positive bacteria since all purple circles are below 1µg/ml for penicillin. Similarly, neomycin is best for gram negative bacteria and streptomycin is best if gram staining is unknown. A drawback of this graph is that the points are not labeled or connected. I tried a few ways to do that with labels and lines, but the graph just became too messy. If you need that much detail, you probably need the table of numbers.

After doing all that, I found Burtin’s original visualization via a NY Times article.

I hope this isn’t what CHANCE is looking for. It has little communication value except to say “Look how cool I am!” At least all the data is present, so a meticulous reader can get what information he needs. The audience for this must be a hospital administrator who needs to feel like he’s getting his money’s worth with fancy visualizations. I think it is more a work of art than of communication.

Jon Peltier has a write-up of his contest entry. It has all the data of Burtin’s original in a much better rectangular structure.

Posted in Graphs | 5 Comments

Gluten Freendly

I’ve been living gluten free since I tested positive for Celiac Disease about 8 months ago. That means no wheat, rye, or barley. Except the convenience of foods like pizza, sandwiches and crackers, I don’t feel like I’m missing too much thanks to Bonnie and local grocery markets that have lots of gluten-free alternatives. Gluten-free bread and pizza are just passable, but Pamela’s gluten-free pancake mix is as good as any.

Eating at restaurants is tough, but some are accommodating. As I learn which places or menus are friendly to gluten-free diets, I identify them as “gluten-freendly”. Amazingly, Google currently reports no hits for the two word term, so I’ll have to start promoting it. Maybe I should add a Wikipedia article on the topic …

Most high end local restaurants are gluten-freendly in that the staff is aware of gluten and can both point out gluten-free dishes and get the kitchen to make substitutions on other dishes. The Lantern in Chapel Hill and Acme in Carrboro both fall into that category. A couple of chain restaurants that are gluten-freendly are Bonefish Grill and PF Chang’s. Both have separate gluten-free menus. Most soy sauce contains wheat, but PF Chang’s will substitute a gluten-free soy sauce; I just wish they would put a little flag or something on the gluten-free plate so I could be more confident that it’s not getting mixed up with others.

Italian is out, but Indian restaurants are fairly safe, and Mexican places usually have a couple corn tortilla dishes. They will also substitute corn tortillas in other dishes, but the ones I’ve had were smaller and too weak to hold a burrito together well.

Posted in Local | Leave a comment

Citysearch Math

Ever notice that everything on Citysearch has a good rating? In scanning a few dozen ratings I have yet to find anything lower than 3.5 stars or a correlation with the rating and the review ratings. This Glass Doctor summary is typical. The average of the reviews is 2.17, but it gets 3.5 stars. Maybe there’s some hidden internal review with a heavier weight, but I haven’t been able to find any explanation at the site. All I can find is another confused user.

I thought maybe the low participation in this area was revealing some seed ratings from Citysearch, but even places with many ratings exhibit the strange math. Here’s a restaurant in Atlanta with 70 reviews averaging 3.86 but getting a full 5 stars.

The ratings don’t always go up — I found a couple of places with a one or two 5-star reviews but with 4 star ratings.

Posted in Math | 1 Comment

Willow 1996 – 2008

Last week, Willow ended a brave struggle with bone cancer. She stayed active until the end, never failing to remind me when it was time to hit the trails for a walk or a swim any day I was home past 10 a.m. She is missed.

Posted in Uncategorized | 4 Comments

Fast Factoring for 64-bit Integers

Some of the Project Euler problems involve factoring numbers which are either large or small depending on your perspective. Problems are generally limited to 64-bit integers (about 18 digits) which are big numbers for most of us, but in the field of factorization those numbers are terribly small compared to the 100+ digit numbers security protocols deal with. Most advanced methods deal with optimizing the factoring of those huge numbers and don’t mind significant amount of overhead, but I want to know what’s fastest for 64-bit integers.

To find out, I ran some tests on some variations on three basic, low-overhead methods: Trial Division, Fermat’s method, and Pollard’s Rho method. All of these take a long time if the number being factored is actually prime, so it’s worthwhile to add in a fourth component which is a Miller-Rabin primality check. Here are my timing results for 400,000 random 64-bit integers. Actually, only the first test uses 400,000 numbers since I limited each test to 1 hour and extrapolated beyond that.

Seconds Method
811 Rho + Trial Division + MR
6359 Fermat + Trial Division + MR
6393 Trial Division + MR after each factor found
29397 Trial Division +MR at start
71195 Trial Division without MR

I was really surprised at how well the Rho method worked in practice. It’s a probabilistic method that’s basically like trial division except it chooses numbers at random instead of sequentially. However, the “random” generator uses a polynomial such that lots of the values can be tested at once using some fancy number theory.

Fermat’s Method works best when there are two divisors near √n, which apparently doesn’t happen very often. Here is my Rho code, which is adapted from some pseudocode in a gamedev forum thread.

[sourcecode language='java'] long rhoFactor(long n, int c, int max) {
int check = 10;
long x1 = 2;
long x2 = 4 + c;
long range = 1;
long product = 1;
int terms = 0;

for (int i = 0; i < max; i++) {
for (int j = 0; j < range; j++) {
x2 = (squareMod(x2, n) + c) % n;
long next = multiplyMod(product, Math.abs(x1 - x2), n);
if (next == 0) {
return Math.max(gcd(product, n), gcd(Math.abs(x1 - x2), n));
}
else if (++terms == check || j == range - 1) {
long g = gcd(next, n);
if (g > 1)
return g;
product = 1;
terms = 0;
}
else
product = next;
}

x1 = x2;
range *= 2;
for (int j = 0; j < range; j++) {
x2 = (squareMod(x2, n) + c) % n;
}
}
return 1;
}
[/sourcecode]

For the parameters, I used small odd numbers for c, the polynomial constant term, and 16 – 20 for max which limits the generated values at around 2^max. If the factorization fails, I increase c by 2 and try again. For max = 16, it failed to find a factor about once for every 10,000 numbers and never failed twice in my tests. And those numbers had already had any small factors (less than about 50,000) removed with trial division.

Posted in Code, Java, Math | 1 Comment

Obama Tie-dye

I figured out a folding pattern to make the Obama logo in one pass. After I made the first one for Beth, I made a second batch for other friends, though I think the first one came out better. I keep thinking I know what I’m doing, but it’s hard to reproduce a tie-dye pattern. Here’s the first one on the left and one from the second batch on the right.

Now, if only I had thought of this a year ago, they’d be all over the country…

Posted in Uncategorized | Leave a comment