A practical, research-backed roadmap (25+ papers analyzed) for transforming organizations into evidence-based, experiment-driven cultures. Follow four phases — Buy-in & Core Team, MVP Platform & Culture Shift, Validated Learnings, and Operational Excellence—to scale experimentation, standardize metrics, and embed data-driven decision-making across every level of the business.
View My GitHub Profile
Hi, I’m Robin 👋 An experienced full-stack staff software engineer (and published author) with experience in building in-house exp platforms & experimentation culture. Let’s talk evidence-based decision making on LinkedIn or email
This is a practical guide on how to evolve your opinion-based organization to an organization that uses experimentation & validated learnings for decision making. This guide is based on practical experience, 25+ research papers and many decades of learnings published by big-tech (Facebook, Google, Netflix etc). From manually running the first few experiments to operational excellence - this is the natural next step for many small- to mid-scale software organization to achieve growth.
Phase 1 - Buy-in & core team (first 3-6 months):
- Secure buy-in from execs & senior leadership
- Build core team with mix of engineers & data-scientists
- Invite 1-2 starting teams as first customers
- Run a few experiments with manual processes if needed
- Research available 3rd-party tools
- Study research papers (Kohavi, Tang, Dmitriev, Xu, Deng, Bosch, Fabijan, Holmström Olsson…)
- Buy vs build consideration (important to get this right)
- First MVP platform (cohorting + population filters + automated evaluation)
- Power analysis tool
- Help teams define standardized reusable metrics
- Drive culture shift for psychological safety
- Advocate for quick-win low-effort experiments (quantity over quality at this time)
- Publicly celebrate any completed experiments
- Offer metrics & experiment design support (hypothesis, cohorts, primary metrics, outlier handling)
- Create reusable experiment templates
Phase 3 - Validated learnings (year 2-3)
[Data]
- Evolve high-quality & standardized data-pipelines (e.g. offline metric store)
- Collect 4 types of metrics: success, guardrail, data quality & debug metrics
- Handle late-arriving & incomplete data
- All product teams should define their standardized list of predefined metrics
- Build LTV metrics
- Pipeline for qualitative user insights to generate experiment hypotheses
[QA & releases]
- Ensure instant product deployments, fast releases & extensive monitoring
- Work with QA to prevent product combinatorics problem
[Tooling]
- Democratise decision-making of experimentation methodology
- Statistical framework: colliding experiments, carry-over effects, A/A, novelty effects, salted randomization, Bonferroni, SRM, …
- User attribution targeting
- Consider product holdout groups
- Ensure consistent user experience across devices
- Experiment group exposure logging
- Built-in security/compliance/GDPR checks
- Pre-experiment & post-experiment automated checklists
- Regression-driven experimentation (gradual rollouts, canaries…)
- Alert & automatic stop of harmful experiments
- Global guardrail metrics
- Experiment reporting with p-value & confidence intervals
- Prevent peeking problem
- Slice & dice of experiment results
- Institutional memory of past experiments
- Enable data-scientist auditing
- Visualise ROI + cost
[Culture]
- Survey state of experimentation in different teams
- Company-wide education: Workshops, demos, onboarding, checklists, problem-solving guides, how-tos
- Teach Twyman’s Law (too good to be true = not true), statistical foundations, common pitfalls etc.
- Advanced progress reporting: blogging, highlighted experiments, exec reports
- Teach feature-slicing mentality, build-measure-learn blocks
- Embed experiment experts into teams
- Define clear decision-making frameworks
- Forced wrap-up of expired experiments with decision reasoning & qualitative learnings
- Sharing of all experiment results with global peer-review process
Phase 4 - Operational Excellence (year 3+)
- Experiment specific holdout groups to monitor long-term effects
- Tie experiments to long-term business objectives
- Reverse & negative experiments
- Experiments everywhere: Notifications, emails, landing pages, marketing etc
- Experiments for all: finance, marketing, leadership etc
- Less pre-planned backlogs, engineers work freely to explore ideas to move metrics
- Purposefully hire autonomous engineers with experimentation mindset
- Rewards & performance review process based on validated metrics movement
- Metric trees & tiers, use experiment results to prove causality
Highlighted sources
- Online Controlled Experiments at Large Scale - Kohavi et al - link
- The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale - Fabijan et al - link
- The Anatomy of a Large-Scale Experimentation Platform - Gupta et al - link
- Introducing Continuous Experimentation in Large Software-Intensive Product and Service Organisations - Yaman et al - link
All sources
- Leaky Abstraction In Online Experimentation Platforms: A Conceptual Framework To Categorize Common Challenges - Kluck, Vermeer - link
- Experiment reporting at AirBnb - AirBnb - link
- Democratizing online controlled experiments at Booking.com - Kaufman, Pitchfork, Vermeer - link
- Engineering for a Science-Centric Experimentation Platform - Diamantopoulos et al - link
- How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments - Chen, Liu, Xu - link
- From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks - Xu et al - link
- It takes a Flywheel to Fly: Kickstarting and Keeping the A/B testing Momentum - Fabijan et al - link
- Seven Rules of Thumb for Website Experimenters - Kohavi et al - link
- Pitfalls of Long-Term Online Controlled Experiments - Dmitriev et al - link
- The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale - Fabijan et al - link
- Introducing Continuous Experimentation in Large Software-Intensive Product and Service Organisations - Yaman et al - link
- The Benefits of Controlled Experimentation at Scale - Fabijan et al - link
- The Anatomy of a Large-Scale Experimentation Platform - Gupta et al - link
- Decision Making at Netflix - Netflix - link
- Exp platform at Zalando - Huang - link
- How We Reimagined A/B Testing at Squarespace - Absher et al - link
- Why we use experimentation quality as the main KPI for our experimentation platform - Perrin et al - link
- Online Controlled Experiments at Large Scale - Kohavi et al - link
- Designing and Deploying Online Field Experiments - Bakshy et al - link
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation - Tang et al - link
- Building Blocks for Continuous Experimentation - Fagerholm et al - link
- The RIGHT Model for Continuous Experimentation - Fagerholm et al - link
- Transitioning Towards Continuous Experimentation in a Large Software Product and Service Development Organisation – A Case Study - Yaman et al - link
- Success stories from a democratized exp platform - Forsell et al - link
- Using data to build better products - Bosch - link
- Effective online controlled experiment analysis - Fabijan et al - link