Multi-Armed Bandits with Inference Considerations

Published in Working Paper, 2019

Multi-armed bandits (MAB) are sequential experimentation procedures that use a combination of exploration and exploitation techniques to reduce allocations to interventions with sub-optimal outcomes. MAB’s are very effective in reducing the regret of the experimentation process compared to A/B testing, especially in the presence of multiple policy levers. However, unlike A/B testing, MAB’s may fail to accurately estimate the parameters of treatment effect distributions of interventions. In many Marketing, Clinical Trials, and Public Policy settings, estimating the parameters of treatment effect distributions is as crucial as that of identifying the best intervention, e.g., feedback for intervention designers. In this paper, we propose a new MAB algorithm called UCB-INF that solves the above problem. We show that UCB-INF has regret comparable to the best MAB algorithms while having the parameter estimation properties of A/B testing. Download paper here