Who and when should one use Multi-Armed Bandits? A scenario.

Sun, 11 Oct 2020 23:13:00 -0600

💡 This is a humor piece. I am using the scenario to communicate the use of MAB’s. Please take it with a pinch of salt.

A manager at a tech company that sells Alphonso mangoes online is stuck with one of the most difficult decisions. What should the color of the “Buy Now” button be? She recently came to know that data-driven decision making is the new fad and wants to follow suit. She has a meeting with her team of data scientists and marketing folks.

Thompson Sampling Algorithm for Normal outcome distribution

Thu, 10 Sep 2020 21:46:00 -0600

Thompson Sampling is one of the most popular Multi-Armed bandit (MAB) algorithms - the main reason being its explainability (imagine explaining upper confidence bound to your manager) and decent performance in practice [1].

Many blog posts on the Internet show how to implement Thompson sampling (here, here, here and here). Almost all of them consider Bernoulli outcome distribution (e.g. click or no click, purchase, or no purchase) and use the Beta-Bernoulli Bayesian update procedure for simulations and usually compare the performance (Regret) to other MAB algorithms like UCB or sometimes to A/B testing. However, none of them consider Gaussian outcome distribution and especially the case when both mean and variance of the distributions are unknown (which is usually the case when you are conducting an experiment where the outcome is continuous, ex: dollars spent, time spent, etc.).

MAB on Sandeep Gangarapu

Who and when should one use Multi-Armed Bandits? A scenario.

Thompson Sampling Algorithm for Normal outcome distribution