Josh Pitzalis

April 25, 2021

#26 Running your first product experiment

The problem with a lot of the content around A/B testing on the web is that it only covers landing pages and e-commerce websites. I want to be able to run A/B tests on the words inside the product experience.

Tools that test landing pages, like Google Optimize, VWO and Optimizely are client-side tools. They work by placing a cookie in your browser to identify you. This way if you come back to the site again, the site remembers who you are and shows you the same version on the site you saw the first time.

Client-side testing doesn't work for web apps that have user accounts. Someone might log in to their account from a different device, like their phone, or a different browser or a separate computer altogether (like the one they have at work vs home). This means that they would get two completely different variants of the same app. This is bad for the user and it ruins your experiment.

The way to fix this is to run a server-side A/B test. Rather than using cookies in your browser, server-side testing tools keep a record of the users on your database and keep track of which experiment a user is in at any point. When a user logs in, your server-side testing tool checks to see what experiment a user is a part of and lets you know so that you can show them the correct variant of the app.

Optimizely and VWO have server-side testing features but they're only on the enterprise plan. Based on the last time I spoke to them, they cost at least 1K a month and you're locked in for the whole year. This puts server-side experimentation out of the reach of most small projects. 

Luckily, there is another way to run server-side experiments based on using feature flags. Technically, Optimizely and VWO just provide feature flags, but I'm going to distinguish them because other feature flag tools are significantly cheaper. For example, Optimizely even has its own feature flag product called Rollouts (and it's free).

The idea with a feature flag manager is to build the changes you want to test in your app under a true/false condition called a flag. Your feature flag manager keeps track of whether the condition is true or false for a given user. When the condition is true they see the changes, if it's false then they don't.

I don't want this post to become too technical. If you're the engineer that has to implement this, I've put together a 3-minute video that shows you how this works and how to integrate it into a real project. I have used Optimizely rollouts in the example and I've instrumented in a React project. You should be able to translate the same basic setup to other tools and technologies with a little help doc on whatever tools you decide to use.

 
Feature flags are interesting because they aren't just for running experiments. You can use it to gradually roll out new features in a product. You could turn a feature on for 10% of your users and then gradually ramp up to 100% if there are no complaints or bug reports.

Flags are also handy as a kill switch. You can release a new feature and if something goes wrong you remove the update instantly without having to redeploy any changes.

The best bit is that you can take feature management off of a developer's plate. A product manager can now decide when features go live, and what segments to show them to, without wasting any developer time.

I'm clearly a fan.

I should balance my enthusiasm by pointing out that using feature flags can make software testing complicated. If you have multiple versions of your product in production, keeping any end-to-end testing suites healthy can become messy. It's not impossible, it’s just something you need to be prepared for.

There are quite a few feature flag managers on the market for under $100. The only one I have used extensively is Optimizely's rollouts. I have heard great things about LaunchDarkly, but they are slightly above the $100 when you add on the experiment features and I haven't tried them out yet. I should also point out that Firebase has a remote config feature that lets you do exactly the same things. Some other products in the space that I would like to try out at some point are Split, Config cat, Cloudbees, Flagsmith and Feature flow. I am not affiliated with any of these products in any way. 

If you know of others that are worth trying out please let me know.



These Hey posts are thoughts-in-progress, they're meant to be conversational. Let me know what you think. Replies to this email go straight to my inbox.

About Josh Pitzalis

Prompt Engineering & Evaluations