General Technical Committee | Names |
---|---|
Chairperson: | Dr. P Sreenivasa Kumar (HoD’s Nominee) |
Professor | |
Computer Science & Engineering | |
Indian Institute of Technology Madras | |
Research Advisor: | Dr. Balaraman Ravindran |
Professor | |
Computer Science & Engineering | |
Indian Institute of Technology Madras | |
Research Co-Advisor: | Dr. Nandan Sudarsanam |
Assistant Professor | |
Department of Management Studies | |
Indian Institute of Technology Madras | |
Member 1: | Dr. Sutanu Chakraborti |
Associate Professor | |
Computer Science & Engineering | |
Indian Institute of Technology Madras | |
Member 2: | Dr. Nirav P. Bhatt |
DST-INSPIRE Faculty | |
Department of Chemical Engineering | |
Indian Institute of Technology Madras | |
Thesis Reviewer: | Dr. Krishna Jagannathan |
Assistant Professor | |
Department of Electrical Engineering | |
Indian Institute of Technology Madras |
KEYWORDS: Reinforcement Learning, Stochastic Bandits, UCB, UCBV, EUCBV, Thresholding Bandits, APT, AugUCB
This thesis studies the following topics in the area of Reinforcement Learning: classic Multi-armed bandits in stationary distribution with the goal of cumulative regret mini- mization using variance estimates and Thresholding bandits in pure exploration fixed- budget setting with the goal of instantaneous regret minimization also using variance estimates. The common underlying theme is the study of bandit theory and its applica- tion in various types of environments. In the first part of the thesis, we study the classic multi-armed bandit problem with a stationary distribution, one of the first settings stud- ied by the bandit community and which successively gave rise to several new directions in bandit theory. We propose a novel algorithm in this setting and compare both theoret- ically and empirically its performance against the available algorithms. Our proposed algorithm termed as Efficient-UCB-Variance (EUCBV) is the first arm-elimination al- gorithm which uses variance estimation to eliminate arms as well as achieve an order optimal regret bound. Empirically, we show that EUCBV outperforms most of the state-of-the-art algorithms in the considered environments. In the next part, we study a specific type of stochastic multi-armed bandit setup called the thresholding bandit problem and discuss its usage, available state-of-the-art algorithms on this setting and our solution to this problem. We propose the Augmented-UCB (AugUCB) algorithm which again uses variance and mean estimation along with arm elimination technique to conduct exploration. We give theoretical guarantees on the expected loss of our algo- rithm and also analyze its performance against state-of-the-art algorithms in numerical simulations in multiple synthetic environments.
You can find my MS thesis here. Download M.S thesis