Subhojyoti Mukherjee

I am a research scientist at Adobe Research. My expertise ranges from research and developing algorithms to training machine learning models, Reinforcement Learning, fine-tuning and alignment for LLMs.

Download CV
Email: subhomuk [at] adobe [dot] com

Work Experience

Adobe Research (San Jose)
Research Scientist/Engineer
(Mar 2025 - Present)

Product:
1) Pre-training and fine-tuning small LLMs for Adobe Document Cloud
2) Training LLMs and VLMs for agentic systems in Adobe Express

Research Areas: Understanding role of reasoning during pre-training of LLMs and scaling laws, alignment of LLMs during post-training, developing agentic systems for image editing and perturbation, fine-grained captioning with VLMs

Mentoring: Mentoring interns on projects spanning MLLMs for video understanding, LLMs for personalization, and learning to clarify with LLMs with LLM/VLM-as-a-judge.

Education

Ph.D. (Fall 2019 to Feb 2025)	at ECE, University of Wisconsin Madison advised by Dr. Robert Nowak, Dr. Josiah Hanna, and Dr. Qiaomin Xie Areas of Research: Reinforcement Learning, Active Learning, incorporating deep active learning strategies for Large Language Models (LLMs), aligning Large Language Models with human feedback (RLHF), and understanding sequential decision-making using transformers (DT). PhD Thesis: Adaptive Data Collection for Policy Evaluation, Multi-task Learning and LLM Alignment pdf (Joint) Masters Thesis: Active Sequential Hypothesis Testing with Extension to Active Regression and Multi-armed Bandits pdf
M.S by Research (2015 to 2018)	at CSE, Indian Institute of Technology (IIT) Madras advised by Dr. Balaraman Ravindran, and Dr. Nandan Sudarsanam RISE Lab Areas of Research: Reinforcement learning, Stochastic and non-stochastic Multi-Armed Bandit settings. Masters Thesis: Finite-time Analysis of Frequentist Strategies for Multi-armed Bandits pdf
Bachelor of Technology (2009 to 2013)	at Dept. of Computer Science and Engineering Meghnad Saha Institute of Technology, Kolkata under West Bengal University of Technology, India

Research Internships

Amazon AWS AI, Santa Clara, USA Summer 2024 (full-time)	hosted by Branislav Kveton, Anusha Lalitha and: Sailik Sengupta, Yifei Ma, Aniket Deshmukh, Gaurush Hiranandani. Area of Research: Multi-objective alignment for LLMs.
Amazon AWS AI, Santa Clara, USA Fall 2023 (Part-time)	hosted by Branislav Kveton and: Yifei Ma, Anusha Lalitha, Kousha Kalantiri, Ge Liu, Aniket Deshmukh, Anoop Deoras. Area of Research: RLHF with LLMs.
Amazon AWS AI, Santa Clara, USA Summer 2023 (Full-time)	hosted by Branislav Kveton and: Yifei Ma, Anusha Lalitha, Ge Liu, Aniket Deshmukh, Anoop Deoras. Area of Research: Active In-Context Learning with LLMs.
CMU, ECE Dept., Pittsburgh, USA Summer 2019	hosted by Prof. Gauri Joshi Area of Research: Structured Bandits.
Adobe Research, San Jose, USA Spring 2018	hosted by Branislav Kveton Area of Research: Item recommendation with Ranking and Bandits.
INRIA, SequeL Lab, Lille, France Fall 2017	hosted by Odalric Maillard Area of Research: Non-stationary Bandits.

News

2025

Our paper From Selection to Generation: A Survey of LLM-based Active Learning is accepted in ACL 2025 (main conference).
Our paper Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning is accepted in RLC 2025 (main conference).
Our paper Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits is accepted in RLC 2025 (main conference).
Our paper Logits are All We Need to Adapt Closed Models is accepted in ICML 2025 (main conference).
I joined Adobe Research as a Research Scientist in the Natural Language Group.
I succesfully defended my PhD dissertation at UW Madison.
Our paper From Selection to Generation: A Survey of LLM-based Active Learning is up in arxiv.
Our paper Logits are All We Need to Adapt Closed Models is up in arxiv.
Our paper Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization is up in arxiv.

2024

Our paper Optimal Design for Human Feedback was accepted at NeurIPS 2024 (main conference).
Our paper Optimal Design for Human Feedback was accepted at Models of Human Feedback for AI Alignment workshop in ICML 2024.
Our paper Off-Policy Evaluation from Logged Human Feedback was accepted at Models of Human Feedback for AI Alignment workshop in ICML 2024.
Our paper SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP was accepted at ICML 2024 (main conference).
I will be returning as an Applied Scientist intern to Amazon AWS AI in the summer of 2024.
Our paper SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits was accepted at AISTATS 2024.
My internship at Amazon AWS AI has been extended (as part-time) till February 2024.

2023

I won the Neural Information Processing Systems (Neurips) 2023 top reviewer award.
I passed my prelim exam for the Doctoral degree.
Our paper Multi-task Representation Learning for Pure Exploration in Bilinear Bandits was accepted at Neurips 2023.
Our paper SPEED: Experimental Design for Policy Evaluation in Linear Heteroscedastic Bandits was accepted at ICML 2023 Workshop The Many Facets of Preference-Based Learning.
I won the top reviewer award at Uncertainty in Artificial Intelligence (UAI) 2023.
I worked at the intersection of Active Learning and Large Language Models (LLMs) in my internship at Amazon AWS AI in the summer 2023. My internship has been extended (as part-time) till the end of Fall 2023.

2022

Our paper ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling was accepted at Uncertainty in Artificial Intelligence (UAI) 2022.
Our paper Safety Aware Changepoint Detection for Piecewise i.i.d. Bandits was accepted at Uncertainty in Artificial Intelligence (UAI) 2022.
I passed my Qualification Exam for the Doctoral degree.
Our paper Chernoff Sampling for Active Testing and Extension to Active Regression was accepted at Artificial Intelligence and Statistics (AISTATS) 2022.
Our paper Nearly Optimal Algorithms for Level Set Estimation was accepted at Artificial Intelligence and Statistics (AISTATS) 2022.

2021

I got Master’s Degree in Electrical Engineering from UW-Madison. Now moving on to finish my doctoral degree.
Our paper A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting was accepted in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 21).

2020

Our paper Generalized Chernoff Sampling: A New Perspective on Structured Bandit Algorithms”, was accepted at Theoretical Foundations Of Reinforcement Learning ICML 2020 Workshop.
Our paper A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting was accepted in IEEE Journal on Selected Areas in Information Theory (2020).

2019

Our paper Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits was accepted at Reinforcement Learning for Real Life ICML 2019 Workshop.
Going to spend the summer of 2019 as a Research Associate in the Department of Electrical and Computer Engineering (ECE) at Carnegie Mellon University (CMU) working with Professor Gauri Joshi and Osman Yagan.
I received the 2019 Chancellor’s Opportunity Fellowship award at the University of Wisconsin-Madison.