4 Phases to Build an Evidence-Based Organization using Experiments

Hi, I’m Robin 👋 An experienced full-stack staff software engineer (and published author) with experience in building in-house exp platforms & experimentation culture. Let’s talk evidence-based decision making on LinkedIn or email

This is a practical guide on how to evolve your opinion-based organization to an organization that uses experimentation & validated learnings for decision making. This guide is based on practical experience, 25+ research papers and many decades of learnings published by big-tech (Facebook, Google, Netflix etc). From manually running the first few experiments to operational excellence - this is the natural next step for many small- to mid-scale software organization to achieve growth.

Phase 1 - Buy-in & core team (first 3-6 months):

Secure buy-in from execs & senior leadership
Build core team with mix of engineers & data-scientists
Invite 1-2 starting teams as first customers
Run a few experiments with manual processes if needed
Research available 3rd-party tools
Study research papers (Kohavi, Tang, Dmitriev, Xu, Deng, Bosch, Fabijan, Holmström Olsson…)

Phase 2 - MVP platform & initial culture shift (~6 months)

Buy vs build consideration (important to get this right)
First MVP platform (cohorting + population filters + automated evaluation)
Power analysis tool
Help teams define standardized reusable metrics
Drive culture shift for psychological safety
Advocate for quick-win low-effort experiments (quantity over quality at this time)
Publicly celebrate any completed experiments
Offer metrics & experiment design support (hypothesis, cohorts, primary metrics, outlier handling)
Create reusable experiment templates

Phase 3 - Validated learnings (year 2-3)

[Data]

Evolve high-quality & standardized data-pipelines (e.g. offline metric store)
Collect 4 types of metrics: success, guardrail, data quality & debug metrics
Handle late-arriving & incomplete data
All product teams should define their standardized list of predefined metrics
Build LTV metrics
Pipeline for qualitative user insights to generate experiment hypotheses

[QA & releases]

Ensure instant product deployments, fast releases & extensive monitoring
Work with QA to prevent product combinatorics problem

[Tooling]

Democratise decision-making of experimentation methodology
Statistical framework: colliding experiments, carry-over effects, A/A, novelty effects, salted randomization, Bonferroni, SRM, …
User attribution targeting
Consider product holdout groups
Ensure consistent user experience across devices
Experiment group exposure logging
Built-in security/compliance/GDPR checks
Pre-experiment & post-experiment automated checklists
Regression-driven experimentation (gradual rollouts, canaries…)
Alert & automatic stop of harmful experiments
Global guardrail metrics
Experiment reporting with p-value & confidence intervals
Prevent peeking problem
Slice & dice of experiment results
Institutional memory of past experiments
Enable data-scientist auditing
Visualise ROI + cost

[Culture]

Survey state of experimentation in different teams
Company-wide education: Workshops, demos, onboarding, checklists, problem-solving guides, how-tos
Teach Twyman’s Law (too good to be true = not true), statistical foundations, common pitfalls etc.
Advanced progress reporting: blogging, highlighted experiments, exec reports
Teach feature-slicing mentality, build-measure-learn blocks
Embed experiment experts into teams
Define clear decision-making frameworks
Forced wrap-up of expired experiments with decision reasoning & qualitative learnings
Sharing of all experiment results with global peer-review process

Phase 4 - Operational Excellence (year 3+)

Experiment specific holdout groups to monitor long-term effects
Tie experiments to long-term business objectives
Reverse & negative experiments
Experiments everywhere: Notifications, emails, landing pages, marketing etc
Experiments for all: finance, marketing, leadership etc
Less pre-planned backlogs, engineers work freely to explore ideas to move metrics
Purposefully hire autonomous engineers with experimentation mindset
Rewards & performance review process based on validated metrics movement
Metric trees & tiers, use experiment results to prove causality

Highlighted sources

Online Controlled Experiments at Large Scale - Kohavi et al - link
The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale - Fabijan et al - link
The Anatomy of a Large-Scale Experimentation Platform - Gupta et al - link
Introducing Continuous Experimentation in Large Software-Intensive Product and Service Organisations - Yaman et al - link

All sources

Leaky Abstraction In Online Experimentation Platforms: A Conceptual Framework To Categorize Common Challenges - Kluck, Vermeer - link
Experiment reporting at AirBnb - AirBnb - link
Democratizing online controlled experiments at Booking.com - Kaufman, Pitchfork, Vermeer - link
Engineering for a Science-Centric Experimentation Platform - Diamantopoulos et al - link
How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments - Chen, Liu, Xu - link
From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks - Xu et al - link
It takes a Flywheel to Fly: Kickstarting and Keeping the A/B testing Momentum - Fabijan et al - link
Seven Rules of Thumb for Website Experimenters - Kohavi et al - link
Pitfalls of Long-Term Online Controlled Experiments - Dmitriev et al - link
The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale - Fabijan et al - link
Introducing Continuous Experimentation in Large Software-Intensive Product and Service Organisations - Yaman et al - link
The Benefits of Controlled Experimentation at Scale - Fabijan et al - link
The Anatomy of a Large-Scale Experimentation Platform - Gupta et al - link
Decision Making at Netflix - Netflix - link
Exp platform at Zalando - Huang - link
How We Reimagined A/B Testing at Squarespace - Absher et al - link
Why we use experimentation quality as the main KPI for our experimentation platform - Perrin et al - link
Online Controlled Experiments at Large Scale - Kohavi et al - link
Designing and Deploying Online Field Experiments - Bakshy et al - link
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation - Tang et al - link
Building Blocks for Continuous Experimentation - Fagerholm et al - link
The RIGHT Model for Continuous Experimentation - Fagerholm et al - link
Transitioning Towards Continuous Experimentation in a Large Software Product and Service Development Organisation – A Case Study - Yaman et al - link
Success stories from a democratized exp platform - Forsell et al - link
Using data to build better products - Bosch - link
Effective online controlled experiment analysis - Fabijan et al - link