Google Code Jam 2015 Round 1

I’ve made it through the Qualifying Round and Round 1 of this years Google Code Jam. You get three chances at Round 1, and this is the first year for me to advance after only one try. This was also the first time I used C++ instead of Java for the contest.

The most interesting problem to me in the qualifying round was Infinite House of Pancakes, where you essentially have to reduce a set of integers to 0 over a sequence of turns with two options at each turn: reduce every number by 1 or split any single number into two numbers (with the same sum). I correctly figured out you could do all the splitting up front, and coded an algorithm that would split the biggest number in half at each turn until the gain wasn’t worth it.

Seemed reasonable, but it failed to pass the small test set for the single number {9}. My algorithm would split it into {5, 4} and then reduce them down in 5 more steps, but it would be better to split it into {6, 3} followed by {3, 3, 3} and then reduce down in 3 more steps. So my halving algorithm had to be rethought. Fortunately there’s no real time pressure in the qualifying round, and I was able to work out a better algorithm that passed the small and the large test cases.

For round 1, the first problem, Mushroom Madness, was straightforward, and I had it completely done in under 15 minutes. I also figured out how to do the third problem, Logging, pretty quickly, but coding it was tricky and took most of my time. The problem involved convex hulls over point sets and I didn’t have any geometry routines handy to speed up the process. I had to get lucky that my code worked without too much debugging.

The second problem, Haircut, was the most interesting for me even though I couldn’t figure it out completely during the contest time limit (150 minutes). Given B barbers operating at different constant speeds and a queue of N customers, you had to figure out which barber the Nth customer would get. Seemed like a straightforward use of a priority queue to simulate all N haircuts. However, just as I was starting to code I noticed the problem said that N could be up to 1 billion, even for the small problem. A priority queue simulation would be about O(N * log(B)) — great for lots of barbers but not for lots of customers.

I coded the simulation anyway hoping it would either trigger a better idea or would be fast enough for the 4 minute time limit (after all, my Mac Pro can do billions of operations every second). It wasn’t fast enough, but I thought of looking for a cycle and noticed the limits for the small problem were 5 barbers taking at most 25 minutes each. That meant I could afford to simulate the least common multiple (LCM) of all the barber times after which the pattern would repeat, and that worked for the small test case.

The large test case allowed 10,000 barbers, and I knew the LCM would be way to big to simulate and didn’t even attempt it. Not having any better ideas, I moved on the the last problem which was fortunate because I needed almost all of the remaining time for the Logging code.

After sleeping on it (did I mention the contest started at 9pm local time on a Friday night?), I worked out a solution to the large Haircut problem the next day. Though I couldn’t compute b(n) in time, I could compute the inverse, n(b), fairly quickly. Since n(b) is monotonic, I could use a binary search to find where n(b) became the n I was looking for.

Implementing the binary search was a different story because part of my reason for using C++ was to force myself to learn more of the STL and idiomatic C++11. STL has a binary search function called lower_bound which expects to operate on iterators over a container. I couldn’t find a good way for it to operate on an indirect evaluation of a function. I tried defining a custom iterator, but it felt wrong, partly because each iterator needs its own reference to the function. I’ve since discovered Boost ranges which might do the job.

Instead I wrote my own indirect version of lower_bound that took an integer range and a function object to call.

// returns lowest number, n, for which f(n) >= value
// f must be non-decreasing
template<typename I, class T, class F>
I lower_bound_eval(I first, I last, const T& value, F f) {
   I count = last - first;
   while (count > 0) {
      I i = first;
      I step = count / 2;
      i += step;
      if (f(i) < value) {
         first = ++i;
         count -= step + 1;
         count = step;
   return first;

That worked, though it took me many hours more than the problem allowed. Without the 22 points for that problem, my final score was 78/100 which put me at #337 out of about 6000 in that round with the top 1000 advancing.

Data Science Specialization at Coursera

Last year, I took the nine online courses in the Data Science Specialization offered by Johns Hopkins University via Coursera. It could have been called “Data Science with R” since one whole course and a good part of the other courses were more about R programming than data analysis. I certainly learned more about R and the R ecosystem than about statistics and data science, but I already had some knowledge of the latter. The nine courses are:

  1. The Data Scientists Toolbox
  2. R Programming
  3. Getting and Cleaning Data
  4. Exploratory Data Analysis
  5. Reproducible Research
  6. Statistical Inference
  7. Regression Models
  8. Practical Machine Learning
  9. Developing Data Products

There’s also a “capstone” course/project, but it’s not offered as often, and the timing didn’t work for me.

Course Structure

Each course was four weeks, and the pattern for most weeks was about 30 minutes of video lectures and a multiple choice quiz. Some courses also had a project involving some data analysis, and those were the most educational parts of the series. There were also practice assignments and online discussion forums, but I found the Coursera site too sluggish to really use those casually.

As would be expected from courses like this, what you get out depends on the extra effort you put in. Someone completely new to the material would have to spend a lot of self-learning time because they’re not going to learn technical material like programming and statistics from a few video lectures. Unfortunately for me, the multiple choice quizzes made it a little too easy to get by with little effort. That’s partly a necessity of an online course, but being able to take each quiz three times seems to undermine any rigor. If you can eliminate just one of the four choices, you’re guaranteed to get each question correct by the third try.


Instructors. One thing that kept me in the series (besides my stubbornness to finish what I started) was the quality of the instructors, Brian Caffo, Jeff Leek and Roger D. Peng. They obviously have an enthusiasm for the subject and did a great job organizing the material. I especially like that they would often do live R coding during the lectures (I’m sure there were some edits, but still…). Sometimes that’s where the best R tips were learned.

Material. I liked that they spent a good amount of time just on programming and just on reproducible results, two topics that could be ignored or brushed over and still pass for data science. For the modeling, they made of point of avoiding linear algebra, which was a nice change from the standard approach.

Projects. While still constrained by the time available, these longer assignments forced you to apply the material to perform some basic analysis and publish the result. The projects used real data sets which often required some clean-up/preprocessing, which was a good lesson in itself.


Peer grading. The longer assignments were better measures, but due to the compressed timing, they had to be kept pretty simple, and due to the large number of students, they had to be peer graded. For peer grading, each student had to grade four others. I can only hope they required some sort of agreement among duplicate graders because the grading guidance was minimal and coarse. (Is there a model provided? Yes/No. Is the model correct? Yes/No…)

The one course I didn’t get high marks in was the course I expected to be my best: Exploratory Data Analysis. I didn’t get good scores on the the long assignment from the peer grading. I knew I wasn’t strictly following the assignment guidelines, but I hoped the graders would be more flexible. The assignment called for a report with something like three pages of text and an appendix with two pages of graphs. I prefer to put the graphs inline with the text, but some graders stuck with the literal guidelines and gave my zeros for that part of the grading (there was often no way of giving partial credit).

Coursera. Besides the sluggish forums and grading limitations already mentioned, I would like to go back to review parts of the material (partly so I could be more specific in this review), but the archives are no longer available. Last month at least some of the course archives were available, but now none are. Given the scope of the material, providing later access seems essential.

Bottom Line

For someone new to the field, I think these courses will be too brief to learn the material well, but they provide a great tour from some great tour guides and will help frame further study.

Days in a Month Formula

Curtis McEnroe posted a “tongue-in-cheek” derivation of a formula for the number of days in a month, ignoring leap years.

His result, in JavaScript:

function f(x) { return 28 + (x + Math.floor(x/8)) % 2 + 2 % x + 2 * Math.floor(1/x); }

His derivation is quite logical. I made a shorter alternative using a different technique, which I will leave for the reader to puzzle out.

function f(x){return 28+(62648012>>x>>x&3); }


My year with Android via Nexus 5

A year ago, I was ready for a change. I was tired of my tiny iPhone screen and the off-contract cost of AT&T cell service, so I switched to a Google Nexus 5 with a nice screen and T-Mobile with friendly and cheaper month-to-month service. But now I’m back to a iPhone and AT&T. The size problem has been fixed with the iPhone 6, though it’s still a little smaller than the Nexus 5, and my AT&T service is now employer subsidized. AT&T has also adapted to offering more T-Mobile-like plans.

Reasons I was looking to change back to iPhone/AT&T:

  • Battery life was terrible for my Nexus 5. It would barely make it 10 hours in sleep mode. I can only imagine that my cell/wifi reception was so poor that the phone was constantly waking up and looking for a better connection.
  • Poor T-Mobile coverage in this area. Too many places, I had no data coverage.
  • Android was not as convenient to sync with my Mac (iTunes/iPhoto). However, now I see iTunes and iPhoto aren’t as convenient as they used to be if you don’t opt in to the cloud sharing.
  • I missed having a hardware mute switch for the entire phone. Android has several different volume settings and muting one (like ringer volume) doesn’t mute the others (like app noise volume).

Things I will miss from Nexus/Android:

  • Convenient integration with Google Now. I get most of the functionality from the Google iOS app, but it’s not as convenient or as tightly integrated as on Android. Surprisingly often, it would tell me what I needed to know before I asked.
  • T-Mobile customer service. The fact that it was easy to cancel is a good sign in itself. Also, when we went to Toronto last summer, T-Mobile prompted me to buy a short-term international data plan (about $20), while my wife’s AT&T phone silently accumulated $100s in international roaming charges, and she thought she had cellular data turned off.
  • Openness of the app store, including being able to use Firefox and ad blockers.

One bonus of the switch-back is that my old iPhone apps carried over from before. Perhaps the downside of the open Google Play store is there is more a focus on in-app ads instead of paid apps. The ad-based Scrabble was unusable (a fullscreen ad after every move), so I’m happy to get back to my paid, ad-free Scrabble app on iOS.

One more thing: for some reason the iOS Google Maps app doesn’t show traffic as well as the Android version. These two screen captures are showing about the same thing before my morning commute, but the iOS version on the left puts the blue “your route” line over the red “slow traffic” lines, making it hard to see the traffic along your route.

Charity Solicitations Visualized

Way back in 2006, I tallied my charity solicitations, and since then the situation has gotten comically worse. So over the past year, I’ve tried to keep all the solicitations I received from various charities, some of which I haven’t contributed to in years. This time, instead of a simple table, I’ve made a couple kitchen floor charts showing the actual pieces of mail received. I probably missed a few, and phone calls are not represented.

Here are my “charts” of charities sending the most and least amount of mail in 2014.



Care is easily the most annoying (I even found another piece after taking the pic). I didn’t even know I had donated to Care, but they handled a donation for typhoon relief.

It’s good to see that Public Citizen and Southern Poverty Law Center are doing better. They were the top offenders in 2006, but they’re now mostly honoring my request for one solicitation per year.

Perhaps the annoyingness is exacerbated by my giving pattern, which is to give toward the end of the year. Unfortunately for me, common practice is to accept the December gift and then send an “annual renewal” just a few weeks later in January.

The worst offenders have either been dropped or switched to reduced anonymous giving, but I expect the junk mail to continue for years. And anonymous giving is expensive as far as I can tell. Network for Good adds a 5% fee and JustGive adds 4.5%. Hopefully, that includes the credit card fees, but I’m not sure. It shouldn’t been so expensive just to move money. Fidelity Charitable with a flat fee of $100 per year may be another option, especially if the credit card fee is separate for the other options.

My truly least annoying charity (annoyance == 0) is one that sent me no mailings or online annoyances: the Online Encyclopedia of Integer Sequences. Thanks to Neil Sloane and many volunteers for that excellent resource. Wikimedia was also in the no-mailings camp, but they made Wikipedia pretty annoying to use for most of December, so I’m not sure where to rank them.

Remaking ProPublica’s Blocked News Graph

Earlier this month, investigative newsroom ProPublica published an interactive graph showing the availability of various news web sites within China.


My initial reaction was that the missing (gray) and inconclusive (yellow) data are dominating the view without providing much information. Of course, the graph is precisely reflecting the data available, but I’d prefer a view that helps me see larger patterns. Which news sites were likely blocked all year? The Huffington Post stands out with a lot of red, but it’s not so obvious what other news sites were also never measured as available throughout the year.

Here’s a close-up of three sites with lots of inconclusive measurements that dominate the coloring. They have three different mixes of conclusive measurements, but that difference doesn’t stand out until you focus on it.


So my main “improvement” is to interpolate the absent/inconclusive data. For instance, if a site is measured as blocked, then unmeasured, then measured as blocked again, relabel the middle no data region as presumed blocked. Same for open, and for mixed endpoints relabel no data as in transition. Inconclusive is a really tricky category; I leave it as different from no data but similarly tint inconclusive measurements according to the surrounding measurements.

My second improvement is to avoid the red/green colorblindness issue. I switched the open color from green to blue. The stoplight colors have strong connotations which help with the interpretation of the original graph, and so it’s a trade-off to balance a small benefit to the 97% with normal vision versus a large benefit to the few. Still, here’s what the close-up above looks like with the most common red/green colorblindness (using Color Oracle):


My third improvement is to reduce the white space between rows. I didn’t realize it until I tried it, but I think the strong white banding is distracting.

Finally, here’s my view (data through December 19):


Is my improvement better? In its current form it looks a bit busy, but I think if a designer picked better colors, it would be quite functional, assuming you buy in to the interpolation idea.

With my remake, it’s easier to see the three sites that were never seen as blocked (CNN, ProPublica, and Washington Post), the six news sites that were never seen as open and especially the transitions, which might be the most interesting parts. Presumably, the late September transitions are related to the Hong Kong protests.

I haven’t tried to reproduce the nice axis labeling, reference lines or the interactivity of the ProPublica version, all of which I like.

Toronto Discs

Bonnie and I spent last week in Toronto, and the week ended with a Frisbee theme. Though most of the trip was unplanned tourism, I did preview the area Ultimate offerings before leaving and found a Friday evening pick-up game that was only a 30 minute walk from the hotel.


Here’s a pic of the game at Sir Winston Churchill Reservoir with the ever-present CN tower in the background. The game was recreational, which is good because my legs were a little out of sync from walking around the city all week.

By complete coincidence, our hotel was across from the home field of the Toronto Rush semi-pro Ultimate team and they had a play-off game that Saturday night. Here’s a pic from the hotel room of the team warming up for the game.

torontoultimaterush1I attended most of the game despite the steady light rain. The rules were a little different with active referees assessing yardage penalties for violations. Unfortunately the game was a little sloppy due to the rain, but there was still some nice play amplified by three on-field video cameras and a giant replay screen on the scoreboard.

The next day, we ran into a third disc connection: a full disc golf course on the tiny islands in Lake Ontario. I don’t think I’ve ever seen a disc golf warning sign.


Not sure how active the community is. I saw this flyer for a tournament on the bulletin board and thought I might enter until I noticed the date.