"Exploring Thompson Sampling in Reinforcement Learning: A Deep Dive into the Bernoulli Bandit Problem"

Introduction to Thompson Sampling: the Bernoulli bandit

Thompson Sampling is a very simple yet effective method to addressing the exploration-exploitation dilemma in reinforcement/online learning. In this series of posts, I’ll introduce some applications of Thompson Sampling in simple examples, trying to show some cool visuals along the way. All the code can be found on my GitHub page here.In this post, we explore the simplest setting of online learning: the Bernoulli bandit.Problem: The Bernoulli BanditThe Multi-Armed Bandit problem is the simplest ...