Making code fasterThe obvious costs

time to read 3 min | 427 words

In my previous post, I presented a small code sample and asked how we can improve its performance. Note that this code sample has been quite maliciously designed to be:

  • Very small.
  • Clear in what it is doing.
  • The most obvious way to do it.
  • Highly inefficient.
  • Mislead people into non optimal optimization paths.

In other words, if you don’t get what is going on, you’ll not be able to get the best out of it. And even if you do, it is likely that you’ll try to go in a “minimum change of the code” that isn’t going to be doing as much for performance.

Let us look at the code again:

The most obvious optimization is that we are calling _line.Split() multiple times inside the Record class. Let us fix that:

This trivial change reduce the runtime by about 5 seconds, and saved us 4.2 GB of allocations. The peak working set increased by about 100 MB, which I assume is because the Record class moving from having a single 8 bytes field to having three 8 bytes field.

The next change is also pretty trivial, let us remove the File.ReadAllLines() in favor of calling File.ReadLines(). This, surprisingly enough, has had very little impact on performance.

However, the allocations dropped by 100 MB, and the working set dropped to 280 MB, very much near the size of the file itself.

This is because we no longer have to read the file into an array, and hold on to this array for the duration of the program. Instead, we can collect the garbage from the lines very efficiently.

This conclude the obvious stuff, and we managed to gain a whole 5 seconds of performance improvement here. However, we can do better, and it is sort of obvious, so I’ll put it in this post.

As written this code is single threaded. And while we are reading from a file, we are still pretty much CPU bound, why not use all the cores we have?

As you can see, all we had to do was add AsParallel(), and the TPL will take care of it for us.

This gives us a runtime of 9 seconds, allocations are a bit higher (3.45GB up from 3.3 GB) but the peak working set exceeded 1.1GB. Which makes a lot of sense.

Now, we are now standing at 1/3 of the initial performance, which is excellent, but can we do more? We’ll cover that in the next post.

More posts in "Making code faster" series:

  1. (24 Nov 2016) Micro optimizations and parallel work
  2. (23 Nov 2016) Specialization make it faster still
  3. (22 Nov 2016) That pesky dictionary
  4. (21 Nov 2016) Streamlining the output
  5. (18 Nov 2016) Pulling out the profiler
  6. (17 Nov 2016) I like my performance unsafely
  7. (16 Nov 2016) Going down the I/O chute
  8. (15 Nov 2016) Starting from scratch
  9. (14 Nov 2016) The obvious costs
  10. (11 Nov 2016) The interview question