Bruno Mengatti

April 30, 2022

3. Not-so-obvious Ops Toolkit 1/2

Not-so-obvious Ops Toolkit — 1/2


I’ve been operating tech companies — of different sizes, shapes and markets — basically since I started my career. One thing I see in common in operations teams is how some folks basically freestyle their path through operations.

It turns out Operations means a lot of different things in different contexts, and some take a more methodical, even scientific approach than others. But there’s actual science in Operations that will be useful to anyone interested in getting things right faster and often.

Having a firm grasp of the math behind operations allows for teams to prepare better for any challenges they might face, and give companies the levers to scale fast without wasting much time and money — in short, to maximize the use of its resources.

(I’m making this a two-parts post. When I was writing, it quickly became a lengthy and nerdy reading, so I’m splitting into two posts — one with more math and another one with slightly less so. I hope you find it useful!)

How not to be fooled by averages


There are very simple concepts that I see sliding off of conversations when people rise from ICs to managers in operations, and I see no real reason other than laziness for this. One of those is always having the confidence-interval for whatever average your discussing.

This is an easy mathematical operation that will help you not getting fooled by an average value, and keeps you real when thinking about data — not letting you slide off to biases or pre-conceptions so easily.

I’m keeping it simple, and reminding ourselves on how to calculate the confidence interval for a simple proportion (assuming a normal distribution):

p ± z* x sqrt (p x (1-p)/n)

p:
is the proportion

z*:
is the appropriate value from the standard normal distribution for the confidence value we’re looking for

n:
is the sample size

95% is a good enough confidence interval for most use-cases — and the z-value of it is 1,96. For other values, you might check out this table.

So there you go — use it often and you’ll gain more perspective and sharpness whenever you’re talking about averages.

How to pick a good sample size


In line with having a proper confidence interval, many people design tests in operations (or marketing, or people operations, or you-name-it) without taking into account how many people, events, points of contact, etc are being exposed to the test.

Then, when they run it and go about calculating the CI or standard deviation for whatever result they got, they find out the information acquired is worthless. This is basically due to a poorly designed test when it comes to the size of the sample.

How not to make this mistake? There are a few methods to this. You could work-out the formula for the CI backwards — pick a proportion you hypothesize to be true, and work backwards from a CI that you'd be comfortable with to prove it true or false, and you'll reach the necessary sample size.

Another method is to take into account an estimated standard deviation, and use the z-value along with the STDEV and the margin of error you’re up to take in to reach your sample size:

p = [z-value² x stdev x (1-stdev)]/(error margin)²

p:
the sample size

z-value:
the appropriate value from the standard normal distribution for the confidence value we’re looking for

stdev:
the estimated standard deviation of your answers. In the absence of a good estimate, 0.5 is a common guess.

error margin:
how much variance you’re willing to take and still ring true to your results. Say it’s 5%, then the value will be 0.05.

A swiss-knife theorem for operations management


Little’s Law¹ is a well-known (but often ignored) theorem on queueing theory — and one of the most useful for operations management². Its simplicity³ may deceive the genius in it, but its applications range from simple brick-and-mortar sizing needs, to complex online operations dynamics and even reaches computational architecture projects.

But… what is it?

L = λW

L:
how many items/people/elements in the system (or queue) at any given point in time

λ:
what is the rate at which these items/people/elements arrive at the system

W:
how long each element stays in the system

In an example: if you know how many people enter your coffee shop each hour — say, 40 — and you know how long on average they stay in it — 6 minutes — , you’ll know how many people are in line at any given point in time on average — 40 x 0.1 = 4!

This is useful for a variety of reasons, but I’ll share some of the use-cases I’ve had:
  • sizing how many CX agents I needed to delivery a quality service, and decrease wait time
  • find efficiency gaps in an on-the-ground operations team, by modeling different parts of the process with Little’s Law
  • estimate concurrent access I’d had in a landing page on the launch of a new product, to find out if we’d had server problems

How to over-hire with precision — flexing the Newsvendor model


Now a slightly more sophisticated approach to a known problem: hiring.

Hiring with precision is a problem every manager comes across, and it is such a popular problem that different versions of it are often used in candidate screenings, for instance.

While there are many ways to approach sizing a team, few are as mathematically validated as the one I’m sharing below — and even fewer take into account the cost of under-hiring as a fundamental parameter. This is fairly counter-intuitive, as what we’re usually trying to achieve is cost efficiency by hiring precisely the right number of people needed to perform a task.

Overlooking the risk of under-hiring is common mistake, but exposes you to all kinds of costs: opportunity costs, losses due to inefficiency, exposing your team to burn out, paying for extra hours, costs of hiring third-party on short notice, among others.

Without further ado, then, how to over-hire with precision⁴ or the Newsvendor model with normally distributed demand:

Q* = µ + zσ

Q*:
is the number of people you’ll need to meet the demand in uncertain (but normally distributed) scenarios

µ:
is the average number of people you need for the average demand

z:
is the z-value for the cost-relation proportion of overstocked and understocked capacity (we’ll get to it in a while)

σ:
is the standard deviation of the average capacity needed for the average demand

Let’s double-click on the z-value. You’ll need to find it in reference of the following cost relation:

C_under / (C_over + C_under)

C_over = cost of overage = cost per unit overstocked relative to demand

C_under = cost of underage = cost of unit understocked relative to demand

C_over could be equal to the hourly cost your employee, or to what you’d pay to send them home in case they’re not needed, for instance.

C_under could be equal to the extra you’re paying your employees to work longer hours, or the loss you’re getting by having less than the necessary people when demand surges or another measure of your loss and risk normalized by the worked hours.

The above proportion will give you a percentage of the time you’ll meet demand — and then give you the basis for the calculation. If you want to be more robust, just up this number a notch and find the z-value of it.

To use the above, you’ll need to know the average number of people that take on the average demand — and calculate a standard deviation to it. Luckily, you now know how to use Little’s Law to estimate demand and by estimating efficiency from your team, you’ll get to an average number of people needed — and then you’ll can either calculate the standard deviation or use a rule of thumb: 20% of the average, for safety.

---

These are obviously shallow takes to concepts with a lot of depth to them, and I’ve left out tons of others really useful concepts and ideas that add a ton of value to operations. But if something on this cheatsheet helps you or makes you curious enough to dig deeper, it was already worth sharing

Let’s chat! What are others must have nuggets on your operations toolkit? What should be common knowledge for every aspiring operations manager, director or COO?