Soumya Rani Samineni

I am an ML Research Engineer at Quantiphi, Bangalore, where I work on Reinforcement learning for workforce optimisation. Prior to this role, I held the position of Research Fellow at Microsoft Research, India with focus on desgning reinforcement Learning algorithms for Energy Grids. I've also worked as an AI Engineer at AI Labs based out of Hyderabad, on development of Quadrupedal controller inspired by MIT Cheetah's Impedence control and also on Object Detection Models.

I did my Masters in Computer Science and Engineering from Department of Computer Science and Automation, IISc Bangalore , where I was advised by Prof Shishir Kolathaya and Prof Shalabh Bhatnagar. My Masters thesis titled Policy search using Dynamic Mirror Descent for offpolicy RL has received funding from Robert Bosch Center for Cyber Physical Systems (RBCCPS) . As part of both Stochastic Systems Lab and Stochastic Robotics Lab I have explored Reinforcement Learning for Robotics and Stochastic approximation.

Before joining IISC, I have worked as an Assitant Executive Engineer(Civil) for Government of Telangana and I have done my bachelors in Civil Engineering from National Institute of Technology Warangal.

Email  /  CV  /  Research_Statement  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

I'm interested in reinforcement learning, machine learning, stochastic approximation, optimization, and deep learning. Much of my research is about novel RL algorithms that are optimal and sample efficient.

1. Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning
Soumya R Samineni*,Utkarsh Mishra*, P Goel, C Kunjeti, H Lodha, A Singh, A Sagi, Shalabh Bhatnagar , Shishir Kolathaya
(*equal contribution)
International Conference on Robotics and Automation (ICRA), 2022  
NIPS Deep RL Workshop, 2021   (Poster)
NIPS Offline RL Workshop, 2021   (Poster)

project page / arXiv / video

Summary: Dynamic Mirror Descent is applied for an H step lookahead policy optimisation to augment the dataset for training an offpolicy RL, improving significantly the sample efficincy of Soft Actor Critic, widely used offpolicy RL algorithm. Further the proposed framework, DeMoRL generalises existing Model Based-Model Free (Mb-Mf) Approaches and acheives state of the art performance in Benchmark MuJoCo Tasks.

2. Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL
Soumya R Samineni*, Masters Thesis, 2021  

arXiv / video / code