What does CPU utilization mean?

I often find many companies use CPU utilization as the main indicator for capacity planning and performance optimization. However, there's a lack of deep understanding and discussion regarding this metric. To illustrate, let's consider the Intel Xeon E5-2620 processor.

The Intel Xeon® E5-2620 v4 2.1 GHz Eight Core Processor enhances the performance and the speed of your system. Additionally, the Virtualization Technology enables migration of more environments. It supports enhanced SpeedStep® technology that allows tradeoffs to be made between performance and power consumption.

As you can see, new CPU virtualization, optimization, and power management technologies significantly change the way the CPU is working. On the contrary, CPU utilization is defined as a busy time as a proportion of the elapsed time. This means we make a false assumption: CPU service time is constant.

What is CPU utilization? Per Brendan Gregg:

The metric we call CPU utilization is really "non-idle time": the time the CPU was not running the idle thread. Your operating system kernel (whatever it is) usually tracks this during context switch. If a non-idle thread begins running, then stops 100 milliseconds later, the kernel considers that CPU utilized that entire time.

Brendan wrote a nice blog explaining why CPU Utilization is Wrong. He wants to bring the attention of memory stall as part of CPU utilization to people. In the CPU pipeline, if the instruction is stalled by memory in a stage, it will create “pipeline bubble”. This is part of the CPU utilization which impacts your service latency. CPU is clever enough to minimize this bubble by leveraging hyper-threading by filling out the stage with another thread.

This makes CPUs non-linear!

For example, a system with 48 CPU cores might misleadingly suggest that doubling the request per second (RPS) from 1000 to 2000 would only require doubling the CPU utilization from 30% to 60%, assuming a 48-core allocation. However, due to hyper-threading, this linear calculation doesn't hold.

The formula to remember is:
reported cores = threads per core x cores per socket x sockets

Hyper-threading benefits become more apparent as CPU clock rates and pipeline lengths increase, leading to larger bubbles that can be filled by additional threads. Despite seeming counterintuitive, this is a fundamental aspect of how CPUs function, indicating that capacity planning cannot assume linearity based on reported cores alone.

Additionally, power management features like Intel's SpeedStep® technology further complicate the linearity of CPU utilization for capacity planning. This blog will not delve into power management, I probably will write another blog about CPU power management. Stay tuned...