News/Updates
- We just won the MyoChallenge @ Neurips2022. Joint work with Rahul Siripurapa, Luis Ferro and others at IARAI and JKU.
- Paper accepted at the Deep RL workshop @ Neurips 2022.
- Our paper on Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
- Our paper on A Dataset Perspective on Offline Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
- Our paper on History Compression via Language Models in Reinforcement Learning has been accepted at ICML 2022. (15th May 2022)
- Our paper on Align-RUDDER: Learning from few Demonstrations by Reward Redistribution has been accepted at ICML 2022 for a *long presentation*. (<2%) (15th May 2022)
- Our paper on A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization has been accepted at AISTATS, 2022.
- Worked at Amazon as an Applied Science Intern. (Seattle, January - May 2022)
|
|
Align-RUDDER: Learning from Few Demonstrations by Reward Redistribution
Vihang Patil,
Markus Hofmarcher,
Marius-Constantin Dinu,
Matthias Dorfer,
Patrick M. Blies,
Johannes Brandstetter,
Jose A. Arjona-Medina,
Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog
/
arXiv
/
video
/
code
We present Align-RUDDER an algorithm which learns from as few as two demonstrations. It does this by aligning demonstrations and speeds up learning by reducing the delay in reward. (Long presentation at ICML, < 2%)
|
|
History Compression via Language Models in Reinforcement Learning
Fabian Paischer,
Thomas Adler,
Vihang Patil,
Markus Holzleitner,
Angela Bitto-Nemling,
Sebastian Lehner,
Hamid Eghbal-Zadeh,
Sepp Hochreiter
International Conference on Machine Learning (ICML), 2022
blog
/
arXiv
/
code
HELM (History comprEssion via Language Models) is a novel framework for Reinforcement Learning (RL) in partially observable environments. Language is inherently well suited for abstraction and passing on experiences from one human to another. Therefore, we leverage a frozen pretrained language Transformer (PLT) to create abstract history representations for RL. (Spotlight presentation at ICML)
|
|
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
Christian Steinparz,
Thomas Schmied,
Fabian Paischer,
Marius-Constantin Dinu,
Vihang Patil,
Angela Bitto-Nemling,
Hamid Eghbal-Zadeh,
Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arXiv
/
code
We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly.
|
|
A Dataset Perspective on Offline Reinforcement Learning
Kajetan Schweighofer,
Andreas Radler,
Marius-Constantin Dinu,
Markus Hofmarcher,
Vihang Patil,
Angela Bitto-Nemling,
Hamid Eghbal-Zadeh,
Sepp Hochreiter
Conference on Lifelong Learning (COLLAS), 2022
arxiv
/
code
We conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments.
|
|
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning
Youssef Diouane*,
Aurelien Lucchi*,
Vihang Patil*
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
arXiv
In this work, we design a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints. (*equal contribution)
|
|
Modern Hopfield Networks for Sample-Efficient Return Decomposition from Demonstrations
Michael Widrich,
Markus Hofmarcher,
Vihang Patil,
Angela Bitto-Nemling,
Sepp Hochreiter
Offline RL Workshop, Neurips, 2021
video
We introduce modern Hopfield networks for return decomposition for delayed rewards (Hopfield-RUDDER). We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes.
|
|
Guided Search for Maximum Entropy Reinforcement Learning
Vihang Patil
2019
We propose a new convergent hybrid method which utilizes policy gradient directions to search in a smaller sub-space, called Guided Evolution Strategies with Sufficient Increase.
|
|