Benchmarking resizable arrays versus linked lists in C++

The second half of Herb Sutter’s recent talk, “Modern C++: What You Need to Know,” explores the question of how many items do you need before it’s better to use a std::list instead of a std::vector for an insertion sort. The conventional wisdom is that eventually the cost of moving half or so of the vector to make room for a new element will overtake the overhead of the linked list traversal. However, testing showed the answer is never and underscored the importance of memory locality in modern computers (with lots of prefetched cache data).

I decided to try it out on my own machine, but before I got too far with the code I looked around to see if someone already had code I could use. Soon enough, I found C++ benchmark – std::vector VS std::list VS std::deque on Baptiste Wicht’s blog. He ran lots of tests (not just insertion and removal)  on lots of data types (not just integers). Like Herb Sutter, Baptiste also found arrays to be faster than linked lists for small objects, but lists were better for large objects. The meaning of large varied depending on the algorithm. For sorting, large was 16 bytes, for insertion, large was more like 64 bytes. He didn’t explore arrays of pointers to large objects.

In trying out his year-old code on my Mac, I encountered a couple compile errors and one logic error. Since the code was on GitHub, I thought it would be a good chance to try it out. I set up a GitHub account, forked his code into my account, fixed the issues, and made a pull request for my changes to be merged back into Baptiste’s original code. A couple days later, he accepted the changes — the internet works!

Here are some results from my runs. I went up to n = 1,000,000 for two operations and two data sizes.

vectlist

 

Looks like for my PowerMac, large is somewhere between 8 and 32. The real hero is deque, which is segmented vector that can easily grow at both ends.

 

Posted in Code | Leave a comment

Hacker Cup 2014 Round 1

I was both disappointed and happy with my Round 1 results in the Facebook Hacker Cup (scoreboard and problems). I was disappointed that I got two of the four problems wrong, but happy I did well enough to advance to round 2 (and came close on the two I missed). The round lasted 24 hours, so I should have taken the time to study and test the problems more that I did.

Problem 1 was essentially about representing a number in a given base with a given digit alphabet. The only trick not covered in the test set was the case where the “base” was 1. Fortunately, I caught that before submission.

Problem 2 involved finding the optimal strategy for a coin game. Not much programming involved, but I was off in my analysis of what the optimal strategy was and missed this one.

Problem 3 required a search for the longest path through a grid with a couple constraints, including being limited to one “backward” string of moves. I used a dynamic programming solution with a 4D array to represent the longest path from a given position (2D), backward-usage (1D) and incoming direction (1D).

Problem 4 required the most analysis, which I mostly got right, but overlooked one detail. The problem boiled down to adjusting a set of up to 20 numbers so that the sum is minimal and every pair has the same greatest common divisor (GCD). My optimized search of the problem space seemed to be good enough, but the detail I missed was in handling of GCD when one of the numbers was 0.

The moral (at least for such lengthly rounds) is to augment the test cases before declaring victory and submitting. Doing so saved me for problem 1 and could have saved me for problem 2. However, the next round lasts only 3 hours, so there is more cost to doing more testing.

Round 2 Update: I completed one problem and thought I completed a second, but had a problem submitting the solution. I accidentally ran my code twice and then got an “invalid character” error after trying to clean up the outputs. However, I had no time for the others and wouldn’t have advanced anyway.

Posted in Code | Leave a comment

Hacker Cup 2014 Qualification Round

I passed the Facebook Hacker Cup qualification round last week-end (problems). Unlike other rounds, there’s no real time pressure in the qualification round. I normally use Java for these competitions since the most productive IDE I have is JetBrains’ IntelliJ IDEA. However, since I’m trying to become more adept at the new idioms of C++11, I also tried that (using Xcode) for two of the problems.

The first problem involved identifying a square of ‘#’ characters in a field of ‘.’ characters. After some messy Java code that analyzed the grid as it was read in, my C++ effort read in the whole grid into memory first. The following C++ felt decent for finding the first and last rows not consisting of all ‘.’ characters:

string blank(n, '.');
auto isntBlank = [&blank](const string & row) { return row != blank; };
auto firstRow = find_if(grid.begin(), grid.end(), isntBlank);
auto lastRow = find_if(grid.rbegin(), grid.rend(), isntBlank);

However, I couldn’t subtract those variables to find the difference because the types are different (one is a forward iterator and the other is a reverse iterator). The base() method converts a reverse iterator to the underlying forward iterator, so the count is determined by

  lastRow.base() - firstRow

A few applications of similar find logic provided a solution that contains no explicit loops, which is a nice result.

I implemented the second problem dealing with basketball substitution patterns using a pair of priority queues for each team: one queue for the players on the court and another for the players on the bench. I don’t think there’s a clean way to allocate an array of generic collections in Java, and I ended up with the following for the bench of the two teams:

@SuppressWarnings("unchecked")
PriorityQueue[] bench = new PriorityQueue[2];
...
bench[t] = new PriorityQueue<>(n, new Comparator() {
   @Override
   public int compare(Student a, Student b) {
      if (a.minutes < b.minutes)
         return -1;
      if (a.minutes > b.minutes)
         return 1;
      if (a.draft < b.draft)
         return -1;
      if (a.draft > b.draft)
         return 1;
      return 0;
   }
});

C++ has a built-in pair<> template class that makes the comparison part more compact, but I never found a clean way to declare the array. For each type of queue, the players and the bench, requires a different comparison function, so it can’t be built-in to the object class itself. There’s a constructor to supply a separate comparison function for that case, but I couldn’t get the constructor argument to match the generic comparison type so I ended up specifying the type in the template argument list. It seems to violate the C++ DRY goal, but it’s not too bad using a typedef.

auto benchLess = [](Student const & a, Student const & b)
   { return make_pair(a.minutes, a.draft) > make_pair(b.minutes, b.draft);};
typedef priority_queue<Student, vector, decltype(benchLess)> BenchType;
BenchType bench[]{ BenchType(benchLess), BenchType(benchLess)};

The third problem involved a dynamic programming style probability calculation and contained the statement, “It is guaranteed that the output is unaffected by deviations as large as 10-8. ” I’m now not sure what that means. I took it to mean that conditional probabilities could be rounded to 10-8 and values less than 10-8 could be ignored in order to get the required 6 digits on accuracy. As an optimization I ignored values less than 10-10. However, that led to one of my answers having just less than the required accuracy, so there must be some other meaning to the statement.

Hoping to try round 1 next week.

 

Posted in Code, Java | Leave a comment

Volunteering at the Ultimate Frisbee US Open

I spent the long Independence Day week-end volunteering at the USA Ultimate US Open, held in Raleigh this year. My main job was scorekeeper, though I also did some field set-up. The hardest part was just standing in the sun all day, but that was better than rainstorms like we had before the tournament began. The volunteers kept the score and some basic stats like which players scored or made turnovers, but some teams had their own stat-keepers using iPad apps that tracked every throw.

The US Open is an invitation-only tournament of eight top team in three divisions, Women’s, Mixed and Men’s, including some international teams. Team Columbia was fun to watch but hardest to keep stats for since their uniform numbers were small and at the bottom of their shirts which were sometimes tucked in. I worked games in all three divisions. In my small sample, it seemed that play quality improved as I went from Women’s to Mixed to Men’s. That guess is born out by a look at turn-over to goal ratio from the US Open basic stats page.

usopen1

However, there is significant variation within divisions. All the teams:

usopen2

These stats are through pool play (before the semifinals), and interestingly none of the eventual winners (Revolver, Odyssée, and Fury) had the best TO/Goal ratios for their respective divisions.

Youtube has some highlights, including the winning catch of the Men’s finals.

 

Posted in Ultimate | Leave a comment

Tie-dye Sportcoat

I’ve wanted to try tie-dying a sportcoat for a while, occasionally looking for white cotton jackets in thrift shops. Finally I broke down and bought the cheapest new white cotton jacket I could find, a $60 G by GUESS blazer. The low-water immersion technique seemed best since it might be difficult to fold and tie something like a jacket. I thought I would keep it a light colored jacket by using diluted periwinkle for the color, but it still came out with strong color (though it’s not as blue as it looks in the photo).

IMG_0363

Apparently the styling is meant for teenagers with long skinny arms. I haven’t figured out any occasion where it might be appropriate to wear it, but I did wear it to work one day with mixed reactions. I’m told if I wear it unbuttoned, it looks less like a woman’s jacket. Maybe I’ll try again with a duller color like brown, gray or dark blue.

I also did a shirt using a variation of the low-water immersion where I packed the shirt into a tight container and sprinkled dye powder over it and then added water. I put the purple dye around the edge and the yellow dye in the middle. I think it came out very good.

IMG_0357

Posted in Art | Leave a comment

Google Code Jam 2013 Round 1

Like last year, it took me three attempts to get through Round 1 of the Google Code Jam, having to get compete at 5am Sunday. For the first two attempts, I essentially skipped the easy (small) problems and went straight into solving the large versions of each problem, which would simultaneously solve the small problem, too. But that strategy didn’t make the cut (top 1000 scores), so I focused on solving the small problems first for Round 1C and finished in the top 300 to move on to Round 2.

I did solve the large version of Problem A, Consonants, after working out a dynamic programming solution. And I almost finished the large version of Problem C, The Great Wall, but it took me an extra 20 minutes after the contest ended to get it right. It’s nice that Google lets you keep trying for practice even when the contest is over.

In finishing the problem, I learned about a Java API I didn’t know about, NavigableMap. I was looking for something like C++’s std::map::lower_bound/upper_bound for iteration through a subrange of an ordered map, but I didn’t see anything in the Java Map API. Fortunately I found NavigableMap, whose floorEntry and higherEntry provide similar functionality. The problem called for keeping track of repeated attacks/fortifications to a wall. For the small solution I stored the wall as an array using an element for each unit-length segment, but that wasn’t efficient enough for the large solution. I needed each stored element to represent a run of similar segments and support splitting of segments as new attacks occurred while staying ordered for fast look-up. The balanced binary tree implementation of NavigableMap fit the bill.

A great feature of Code Jam is being able to examine other competitors’ entries after the contest is over. I checked one top scorer’s solution and he didn’t use such a data structure. Instead, he made two passes through all the wall attacks. On the first pass he recorded where all the splits might occur. Then he could allocate an array for those segments without needing to split elements on the second pass.

Posted in Code | Leave a comment

Unsophisticated Art Review: Suzanne Vega

For Valentine’s Day, Bonnie surprised me with tickets to a Suzanne Vega concert in Carrboro, which was especially nice since I’m a big fan but Bonnie has only known suffering due to my years of repeatedly humming Tom’s Diner.

In any case, we both thought the performance was outstanding. The word that keeps resonating with me is “authentic.” The important things were excellent: the lyrics, the singing, the stories, the music. The other things were appropriately less polished: a hat that fell off, a new song sung with the help of a lyric sheet, missed stage lighting. The small venue helped (300 – 400 people). Her only accompaniment was veteran musician Gerry Leonard, who handled a wide variety of music and even tried to play along when Suzanne Vega sang a verse of her thirty year old country music song (after an audience member request).

I’m sure it’s blasphemous to say so, but one thing I often dislike about live performances is that the music is not as finely mixed as in a studio recording, especially the vocals often getting overwhelmed by the instruments. Except for some bass than sounded clipped at times, the live performance was much better in this case, though perhaps more due of artistic refinement over time. The vocal inflections carried more feeling, and in Small Blue Thing the vocals somehow came out sounding round “like a marble or an eye.”

 

Posted in Art | Leave a comment