Always Valid Inference: Continuous Monitoring of A/B Tests

February 24, 2017, Webb 1100

Ramesh Johari

Stanford University , ECE


Web applications typically optimize their product offerings using randomized controlled trials (RCTs), commonly called A/B testing. These tests are usually analyzed via p-values and confidence intervals presented though an online platform. Used properly, these measures both control Type I error (false positives) and deliver nearly optimal Type II error (false negatives). Unfortunately, inferences based on these measures are wholly unreliable if users make decisions while continuously monitoring their tests. On the other hand, users have good reason to continuously monitor: there are often significant opportunity costs to letting experiments continue, and thus optimal inference depends on the relative preference of the user for faster run-time vs. greater detection. Furthermore, the platform does not know these preferences in advance; indeed, they can evolve depending on the data observed during the test itself. This sets the challenge we address in our work: can we deliver valid and essentially optimal inference, but in an environment where users continuously monitor experiments, despite not knowing the user's risk preferences in advance? We provide a solution leveraging methods from sequential hypothesis testing, and refer to our measures as *always valid* p-values and confidence intervals. Our solution led to a practical implementation in a commercial A/B testing platform, serving thousands of customers since 2015. Joint work with Leo Pekelis and David Walsh. This work was carried out with Optimizely, a leading commercial A/B testing platform.

Speaker's Bio

Ramesh Johari is an Associate Professor at Stanford University, with a full-time appointment in the Department of Management Science and Engineering (MS&E), and courtesy appointments in the Departments of Computer Science (CS) and Electrical Engineering (EE). He is a member of the Operations Research group and the Social Algorithms Lab (SOAL) in MS&E, the Information Systems Laboratory in EE, the Institute for Computational and Mathematical Engineering, the steering committee of the Stanford Cyber Initiative, and the Stanford Bits and Watts Initiative. He received an A.B. in Mathematics from Harvard, a Certificate of Advanced Study in Mathematics from Cambridge, and a Ph.D. in Electrical Engineering and Computer Science from MIT.
He is the recipient of a British Marshall Scholarship, First Place in the INFORMS George E. Nicholson Student Paper Competition, the George M. Sprowls Award for the best doctoral thesis in computer science at MIT, Honorable Mention for the ACM Doctoral Dissertation Award, the Okawa Foundation Research Grant, the MS&E Graduate Teaching Award, the INFORMS Telecommunications Section Doctoral Dissertation Award, the NSF CAREER Award, and the Cisco Faculty Scholarship. He has served on the program committees of ACM Economics and Computation, ACM SIGCOMM, IEEE Infocom, and ACM SIGMETRICS, as the track chair for the Internet Economics and Monetization Track at WWW, and as a co-organizer of the Marketplace Innovation Workshop.