Jorge Manrubia

May 14, 2022

Performance in context

performance.jpg


When it comes to analyzing code performance, context matters a lot. For example, say we have this code that adds two numbers:

a = 1 + 2

And now we need to add a subtraction:

a = 1 + 2
a = a - 3

So what's the cost of this change? We can benchmark:

require "benchmark/ips"

Benchmark.ips do |x|
  x.report("only add") { a = 1 + 2 }
  x.report("add + subtract") do
    a = 1 + 2
    a = a - 3
  end
end

When I run that, I get a 13% performance penalty for the additional subtraction:

Calculating -------------------------------------
            only add     18.694M (± 0.3%) i/s -     94.382M in   5.048948s
      add + subtract     16.258M (± 0.8%) i/s -     82.635M in   5.083165s

A 13% overhead! That doesn't sound very good. Here is when context comes into place. The additional operation adds 80 nanoseconds on average. Is this code part of a loop where every nanosecond matters? Or is it serving a web request, where it spends 100 ms across database queries, view rendering, and network latencies? In the first case, you should optimize away and, also, applaud the author's boldness for using Ruby in such a context. In the latter, those 80 nanoseconds represent a 0,00008% penalty, not a 13% one. You can save the hard work of replacing the code with "a = 0" and nobody would notice.

I know this is a contrived example, but it highlights a problem: it's easy to reach wrong conclusions when benchmarking code without considering the context. So let me illustrate with a more realistic example: the performance impact of Active Record Encryption at writing time.

This article states it's ~35% and it's correct... if you ignore database network latencies. I actually wrote an automated test for this, and explained it in the original pull request:

You will see how the impact on tests is around 30% when encrypting data, but this is because there is no database latency in tests, so the relative impact is magnified. The same test shows identical performance when running in a more realistic environment. These tests are useful for optimization work and to prevent performance regressions.

Indeed, when I run this script in HEY to benchmark updating encrypted and not encrypted columns:

account = Account.find(<some account id>)

Benchmark.ips do |x|
  x.report("regular attribute") { account.update! source: "a random value #{SecureRandom.hex(3)}" }
  x.report("encrypted attribute") { account.update! name: "a random value #{SecureRandom.hex(3)}" }
end

With a local MySQL server in my box, the "encrypted attribute" version is 35% slower. In production, it is 1.5% slower. I obtain similar results when testing things for Basecamp. One runs in AWS, with the database in the same zone, and the other one in our data centers. In both cases, the database latency is a key factor when measuring the impact of subsmillisecond operations. As this thought-provoking piece on SQLite on the server points out:

The per-query latency overhead for a Postgres query within a single AWS region can be up to a millisecond. That's not Postgres being slow—it's you hitting the limits of how fast data can travel. Now, handle an HTTP request in a modern application. A dozen database queries and you've burned over 10ms before business logic or rendering.

In-memory encryption is, at least, one order of magnitude faster than the faster database queries you can run. This means its performance impact in most database-driven applications will be so diluted that you can ignore it. However, thinking that encryption will make your insertions 35% slower might discourage you from using it. See, a wrong decision because of missing context.

A piece of code that runs without interacting with external systems is CPU-bound. Of course, you should be mindful of its performance, especially for infrastructure code. But remember the initial example with the impact of the additional subtract. "No code" is way faster than "any code" in this scenario. Now, web applications and most distributed systems are I/O-bound. You should have this well present when making calls based on  local benchmarks.