Archive for the ‘Code’ Category

First Place at Project Euler

Monday, January 1st, 2007

Project Euler hosts a growing collection of math programming problems that I started participating in earlier this year while it was on the mathschallenge.net web site. Typically, you have to write a program to solve a math problem and then enter your solution to get credit for it. It took me a couple of months to solve all the existing problems, and since then I’ve been solving new problems as they come out, every few weeks.

There are about 50 participants who solve all the problems as they come out and maintain their status as “100% genius” at the site. Among those, there are about a dozen who compete to solve new problems as quickly as possible since ties in the rankings are broken by time of submission of the solution. New problems always come out on Fridays at 1pm (EST) so I usually don’t get to work on them until the evening, which typically results in a “ranking” of around #15.

This week I was off work for the holidays and delayed our trip up to Hyco Lake for a few hours so I could take a crack at the new problems coming out. There were two new problems this time, involving finding solutions to the same Diophantine equation for certain values of n. In the first problem n was bounded to 1 million, and a simple brute force search worked fine. The second problem allowed n to go to 50 million. I started a similar brute force search running just in case it finished, though I knew there must be a simpler solution (all problems are designed to take less than a minute to execute). Being in a hurry, I only did a little analysis and then studied the first hundred or so brute force solutions to find a pattern and guessed that the pattern held for the whole range. It worked, and I ended up in first place on the scores list, at least until the next problem comes out.

Later, I let the brute force attack complete, which took about an hour. I also took the time to figure out why the pattern I observed held.

Google Code Jam 2006

Thursday, September 7th, 2006

I competed in the Google Code Jam this week, but failed to make it through the qualification round. It looks like there were five divisions containing about 1000 competitors each and 200 from each division qualifying for the next round. Each division had different but similar problems, one worth at most 250 points and the other worth at most 750 points, with scores depending on how long it took you to submit a solution. Contestants were given an hour total to submit solutions.

The first problem was fairly easy, and was basically to minimize K + N/K for a given N ≤ 1,000,000. I probably wasted too much time being careful and double-checking things and solved it in 15 minutes or so. Instinct told me to set K to sqrt(N), but I took the time to take the derivate with respect to K and solve for zero. The only trick, I think, was to check nearby values in case of integer rounding weirdness.

The 750-point problem was pretty hard and was to count the number of ways to place K bishops on an NxN chess board, N ≤ 8, given a set of disallowed squares. A number of errors derailed my effort, but I almost got my brute-force solution working before the rest of my hour was up, but even then it wasn’t fast enough on larger boards. Solutions had to run in under 2 seconds for each invocation.

I later sped up my solution off-line. First I used bitboards throughout, getting the 7×7 case under a seconds and the 8×8 case down to 35 seconds. Then I changed from iterating over rows to iterating over diagonals and got the 8×8 time down to 8 seconds. Other tricks shaved off another second or so, but I found the real trick only by looking at successful solutions: memoization.

The other divisions’ 750-point problems I looked at also required memoization, but maybe it was more obvious in their non-chess problems. Only 31 coders succeeded at that problem in my division, whereas over 100 got it in each other division.

I was #221 in my division — guess I should have done less testing for the easy problem. Actually, the best strategy in general would be to do the hard problem first, and save 10 minutes for the easy problem as a back-up. If you only solve one problem, you have to solve the hard one or solve the easy one very quickly to qualify.

I thought the TopCoder contest framework was pretty good, though the primitive online IDE was biased against Mac users by supporting only the Control key for shortcuts like Copy and Paste. I really should have just done all my initial work offline and pasted it in when ready to run the built-in unit tests, but I wasn’t thinking it would matter that much. And it didn’t matter in this case, since I never did get a good solution, even afterwards off-line.

It was fun trying though, maybe I’ll try a few other TopCode contests.

Sudokit Available

Sunday, June 18th, 2006

I’ve finally gotten the Sudokit code together enough to make it available. You can download the zipped source and an executable jar file. The app and the code are both still ugly, but I’ve added a text entry field to make it halfway usable.

There is no documentation. The app is just an experimental solver, trying to mimic human solving techniques and providing a rating for a puzzle’s difficultly. Several puzzles are included to pick from or you can enter your own. The solver has some advanced techniques, and while there are some test puzzles in the “database” that it can’t solve, all newpaper-level puzzles can be solved with only the simplest techniques.

That was the thesis from my original investigation: that if applied methodically, only simple techniques are required to solve newspaper-level problems. The simple techniques being hidden single and naked single. Knowing that, I thought I would test the thesis on a “hard” puzzle from a recent Saturday paper. I couldn’t solve it by hand with the simple methods and had to use some of the subset-based techniques.

Wanting to try that puzzle with Sudokit was enough to get me to add the text entry field (and the log panel at the bottom). Sure enough Sudokit was able to solve it with just the simple techniques. Either I wasn’t methodical enough or Sudokit’s logic is implicitly using more advanced techniques.

No Fluff Just Stuff in RTP

Monday, June 12th, 2006

I attended the RTP No Fluff Just Stuff conference over the week-end. It was mostly focused on Java and web development, which is not my realm these days, but it gave me a chance to learn new things, and there was some general software development content as well.

The speakers were all top-notch, and, surprisingly, every one of the five or six I saw used a PowerBook. They used Keynote for slides, iTerm for commands, and usually FireFox for web. Most used TextMate for coding, except one used IDEA and one used Eclipse.

Highlights: David Geary presented JavaServer Faces as a better Struts, though most speakers in the panel later didn’t seem to think too much of it. Dave Thomas ran a Ruby track, most of which I attended. I finally got some real details on JavaScript from Glenn Vanderburg’s talk. Andy Hunt’s talk on right-brain thinking for programmers was intriguing, but I was only able to attend the first part of it. He recommended Whole New Mind by Dan Pink.

The food and service were very good, but the rooms were uncomfortably cold. I wore more and more clothes each day, but it still wasn’t enough.

JDBC Providers Release

Saturday, April 29th, 2006

I’ve put together a release of the JDBC Providers at berlios. This packages up recent bug fixes and the separation of Java and SQL code which makes it easier to support other DB vendors.

Meanwhile Søren and Mikkel are working on a branch to futher separate the DB by using JNDI so we can rely on the servlet container to maintain the DB connections.

Reconstituted XML Schema Graphs

Tuesday, March 14th, 2006

The paper Analysis of XML schema usage provides a glimpse at some interesting data for a group of schemas that the authors analyzed. Unfortunately, it’s only a glimpse as the data is not provided and the summarizing graphs are generally lacking. And my correspondence with one of the authors indicates that none of the data is available.


Original graph of XML Schema SizesThe first graph, shown here in reduced form, is especially inappropriate. The authors use a scatterplot to show the distribution for schema sizes. To read it, you have to count dots within each horizontal division, as described in the notes for the plot.

Schema Size HistogramNot to be deterred, I reversed-engineered the data from the graph and regraphed it as a histogram, a boxplot and a smoothed density curve, which are all better than a scatterplot for analyzing a distribution of one variable. Unfortunately, JMP doesn’t handle log axes for histograms so I had to graph the log of the size instead of the size. The graphs in the paper obviously use Excel, and maybe it has the same deficiency. The paper uses the original graph to conclude that the bulk of the schemas have sizes in the range of 10KB to 10MB, or 101 KB to 104 KB, though the histogram helps tighten that to the range 101.5 KB to 103.5 KB, for what it’s worth.

Schema Size by LOCThe paper next shows a similar scatterplot (not shown here) of LOC and argues that the similarity of the plots verifies the high correlation between KB and LOC. Not that the conclusion is bad, but why not plot them against each other to show a correlation? The graph at right does just that, showing the fitted line on a log-log scale. Once again, it’s from the reconstituted data.


Oh yeah, I guess I better provide the data to back up my plots; it’s in xsd_reconstituted.csv.



This is not the first time I could have used a graph scraper — is there such a beast? That is, a program that scans a graph and outputs a table of data that could have produced the graph.

Math Challenges Done

Sunday, March 5th, 2006

Maths Challenges Progress GraphI finished the last of the mathschallenge.net math programming problems. Actually, it’s a temporary milestone since new problems are added every few weeks. At right is a graph of problems started per day with a LOESS smoother applied. The data are from the creation date of the program files, and the few problems that I solved without coding are not represented.

A lot of the problems involved combinatorical counting, so it helped that I had just been reading the excellent lecture notes from MIT’s Mathematics for Computer Science course.