MS Thesis

General Technical Committee

General Technical Committee	Names
Chairperson:	Dr. P Sreenivasa Kumar (HoD’s Nominee)
	Professor
	Computer Science & Engineering
	Indian Institute of Technology Madras
Research Advisor:	Dr. Balaraman Ravindran
	Professor
	Computer Science & Engineering
	Indian Institute of Technology Madras
Research Co-Advisor:	Dr. Nandan Sudarsanam
	Assistant Professor
	Department of Management Studies
	Indian Institute of Technology Madras
Member 1:	Dr. Sutanu Chakraborti
	Associate Professor
	Computer Science & Engineering
	Indian Institute of Technology Madras
Member 2:	Dr. Nirav P. Bhatt
	DST-INSPIRE Faculty
	Department of Chemical Engineering
	Indian Institute of Technology Madras
Thesis Reviewer:	Dr. Krishna Jagannathan
	Assistant Professor
	Department of Electrical Engineering
	Indian Institute of Technology Madras

Finite-time Analysis of Frequentist Strategies for Multi-armed Bandits

Abstract

KEYWORDS: Reinforcement Learning, Stochastic Bandits, UCB, UCBV, EUCBV, Thresholding Bandits, APT, AugUCB

Show details

This thesis studies the following topics in the area of Reinforcement Learning: classic Multi-armed bandits in stationary distribution with the goal of cumulative regret mini- mization using variance estimates and Thresholding bandits in pure exploration fixed- budget setting with the goal of instantaneous regret minimization also using variance estimates. The common underlying theme is the study of bandit theory and its applica- tion in various types of environments. In the first part of the thesis, we study the classic multi-armed bandit problem with a stationary distribution, one of the first settings stud- ied by the bandit community and which successively gave rise to several new directions in bandit theory. We propose a novel algorithm in this setting and compare both theoret- ically and empirically its performance against the available algorithms. Our proposed algorithm termed as Efficient-UCB-Variance (EUCBV) is the first arm-elimination al- gorithm which uses variance estimation to eliminate arms as well as achieve an order optimal regret bound. Empirically, we show that EUCBV outperforms most of the state-of-the-art algorithms in the considered environments. In the next part, we study a specific type of stochastic multi-armed bandit setup called the thresholding bandit problem and discuss its usage, available state-of-the-art algorithms on this setting and our solution to this problem. We propose the Augmented-UCB (AugUCB) algorithm which again uses variance and mean estimation along with arm elimination technique to conduct exploration. We give theoretical guarantees on the expected loss of our algo- rithm and also analyze its performance against state-of-the-art algorithms in numerical simulations in multiple synthetic environments.

MS Thesis

You can find my MS thesis here. Download M.S thesis

Subhojyoti Mukherjee

MS Thesis

General Technical Committee

Finite-time Analysis of Frequentist Strategies for Multi-armed Bandits

Abstract

MS Thesis