Vihang Patil

Hi, I am Vihang Patil a Ph.D. student under Prof. Sepp Hochreiter at Institute for Machine Learning, Linz.

My research revolves around long-term credit assignments in Reinforcement Learning and how we can build algorithms that build abstractions to learn faster and generalize to unknown parts of the environment.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
News/Updates
  • Paper accepted at the Generalisation in Planning workshop @ Neurips 2023.
  • Interning at Amazon, Seattle (September 11th, 2023).
  • We just won the MyoChallenge @ Neurips2022. Joint work with Rahul Siripurapa, Luis Ferro and others at IARAI and JKU.
  • Paper accepted at the Deep RL workshop @ Neurips 2022.
  • Our paper on Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
  • Our paper on A Dataset Perspective on Offline Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
  • Our paper on History Compression via Language Models in Reinforcement Learning has been accepted at ICML 2022. (15th May 2022)
  • Our paper on Align-RUDDER: Learning from few Demonstrations by Reward Redistribution has been accepted at ICML 2022 for a *long presentation*. (<2%) (15th May 2022)
  • Our paper on A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization has been accepted at AISTATS, 2022.
  • Worked at Amazon as an Applied Science Intern. (Seattle, January - May 2022)
Contrastive Abstraction for Reinforcement Learning
Vihang Patil, Markus Hofmarcher, Elisabeth Rumetshofer, Sepp Hochreiter
Generalisation in Planning workshop @ Neurips, 2023
pdf

We propose contrastive abstraction learning to find abstract states, where we assume that successive states in a trajectory belong to the same abstract state. Such abstract states may be basic locations, achieved subgoals, inventory, or health conditions. Contrastive abstraction learning first constructs clusters of state representations by contrastive learning and then applies modern Hopfield networks to determine the abstract states.

Simplified Priors for Object-Centric Learning
Vihang Patil, Andreas Radler, Daniel Klotz, Sepp Hochreiter
Under Review @ COLLAS, 2024

We propose a non-iterative object-centric learning method using common building blocks, namely CNN, MaxPool and a modified Cross-Attention layer. A vastly modified version of this work is under review at Collas.

MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand
Vittorio Caggiano, Guillaume Durandau, Huwawei Wang, Alberto Chiappa, Alexander Mathis, Pablo Tano, Nisheet Patel, Alexandre Pouget, Pierre Schumacher, Georg Martius, Daniel Haeufle, Yiran Geng, Boshi An, Yifan Zhong, Jiaming Ji, Yuanpei Chen, Hao Dong, Yaodong Yang, Rahul Siripurapu, Luis Eduardo Ferro Diez, Michael Kopp, Vihang Patil, Sepp Hochreiter, Rahul Siripurapu, Vihang Patil, Sepp Hochreiter, Yuval Tassa, Josh Merel, Randy Schultheis, Seungmoon Song, Massimo Sartori, Vikash Kumar
Neural Information Processing Systems (NeurIPS), 2022
pdf

In the MyoChallenge at the NeurIPS 2022 competition track, the task was to develop controllers for a realistic hand to solve a series of dexterous manipulation tasks. This work paper the challenge and its solutions. Our method was a co-winner of the Baoding Balls task.

InfODist: Online distillation with Informative rewards improves generalization in Curriculum Learning
Rahul Siripurapu, Vihang Patil,Kajetan Schweighofer, Marius-Constantin Dinu, Thomas Schmied, Luis Eduardo Ferro Diez, Markus Holzleitner, Hamid Eghbal-zadeh, Michael K Kopp, Sepp Hochreiter
Deep Reinforcement Learning Workshop, Neurips, 2022
openreview

A method to improve generalization in curriculum learning and an analysis of various factors affecting generalization.

Align-RUDDER: Learning from Few Demonstrations by Reward Redistribution
Vihang Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog / arXiv / video / code

We present Align-RUDDER an algorithm which learns from as few as two demonstrations. It does this by aligning demonstrations and speeds up learning by reducing the delay in reward. (Long presentation at ICML, < 2%)

History Compression via Language Models in Reinforcement Learning
Fabian Paischer, Thomas Adler, Vihang Patil, Markus Holzleitner, Angela Bitto-Nemling, Sebastian Lehner, Hamid Eghbal-Zadeh, Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog / arXiv / code

HELM (History comprEssion via Language Models) is a novel framework for Reinforcement Learning (RL) in partially observable environments. Language is inherently well suited for abstraction and passing on experiences from one human to another. Therefore, we leverage a frozen pretrained language Transformer (PLT) to create abstract history representations for RL. (Spotlight presentation at ICML)

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Christian Steinparz, Thomas Schmied, Fabian Paischer, Marius-Constantin Dinu, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arXiv / code

We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly.

A Dataset Perspective on Offline Reinforcement Learning
Kajetan Schweighofer, Andreas Radler, Marius-Constantin Dinu, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arxiv / code

We conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments.

A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning
Youssef Diouane*, Aurelien Lucchi*, Vihang Patil*
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
arXiv

In this work, we design a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints. (*equal contribution)

Modern Hopfield Networks for Sample-Efficient Return Decomposition from Demonstrations
Michael Widrich, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Sepp Hochreiter
Offline RL Workshop, Neurips, 2021
video

We introduce modern Hopfield networks for return decomposition for delayed rewards (Hopfield-RUDDER). We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes.

Guided Search for Maximum Entropy Reinforcement Learning
Vihang Patil
2019

We propose a new convergent hybrid method which utilizes policy gradient directions to search in a smaller sub-space, called Guided Evolution Strategies with Sufficient Increase.

Modified from Jon Barron's website.