Vihang Patil

Hi, I am Vihang Patil a Ph.D. student under Prof. Sepp Hochreiter at Institute for Machine Learning, Linz.

My research revolves around long-term credit assignments in Reinforcement Learning and how we can build algorithms that build abstractions to learn faster and generalize to unknown parts of the environment.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
News/Updates
  • Paper accepted at the Generalisation in Planning workshop @ Neurips 2023.
  • Interning at Amazon, Seattle (September 11th, 2023).
  • We just won the MyoChallenge @ Neurips2022. Joint work with Rahul Siripurapa, Luis Ferro and others at IARAI and JKU.
  • Paper accepted at the Deep RL workshop @ Neurips 2022.
  • Our paper on Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
  • Our paper on A Dataset Perspective on Offline Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
  • Our paper on History Compression via Language Models in Reinforcement Learning has been accepted at ICML 2022. (15th May 2022)
  • Our paper on Align-RUDDER: Learning from few Demonstrations by Reward Redistribution has been accepted at ICML 2022 for a *long presentation*. (<2%) (15th May 2022)
  • Our paper on A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization has been accepted at AISTATS, 2022.
  • Worked at Amazon as an Applied Science Intern. (Seattle, January - May 2022)
Align-RUDDER: Learning from Few Demonstrations by Reward Redistribution
Vihang Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog / arXiv / video / code

We present Align-RUDDER an algorithm which learns from as few as two demonstrations. It does this by aligning demonstrations and speeds up learning by reducing the delay in reward. (Long presentation at ICML, < 2%)

History Compression via Language Models in Reinforcement Learning
Fabian Paischer, Thomas Adler, Vihang Patil, Markus Holzleitner, Angela Bitto-Nemling, Sebastian Lehner, Hamid Eghbal-Zadeh, Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog / arXiv / code

HELM (History comprEssion via Language Models) is a novel framework for Reinforcement Learning (RL) in partially observable environments. Language is inherently well suited for abstraction and passing on experiences from one human to another. Therefore, we leverage a frozen pretrained language Transformer (PLT) to create abstract history representations for RL. (Spotlight presentation at ICML)

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Christian Steinparz, Thomas Schmied, Fabian Paischer, Marius-Constantin Dinu, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arXiv / code

We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly.

A Dataset Perspective on Offline Reinforcement Learning
Kajetan Schweighofer, Andreas Radler, Marius-Constantin Dinu, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arxiv / code

We conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments.

A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning
Youssef Diouane*, Aurelien Lucchi*, Vihang Patil*
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
arXiv

In this work, we design a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints. (*equal contribution)

Modern Hopfield Networks for Sample-Efficient Return Decomposition from Demonstrations
Michael Widrich, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Sepp Hochreiter
Offline RL Workshop, Neurips, 2021
video

We introduce modern Hopfield networks for return decomposition for delayed rewards (Hopfield-RUDDER). We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes.

Guided Search for Maximum Entropy Reinforcement Learning
Vihang Patil
2019

We propose a new convergent hybrid method which utilizes policy gradient directions to search in a smaller sub-space, called Guided Evolution Strategies with Sufficient Increase.

Modified from Jon Barron's website.