Vihang Patil

Hi, I am Vihang Patil an Applied Scientist at Amazon. I work in the team RUFUS (chatbot for shopping).

My current research interest is in 1) aligning large language models (LLMs) with human preferences, particularly through the use of reinforcement learning. 2) Sub-quadratic architectures for generative models.

Previously, I was a Ph.D. student under Prof. Sepp Hochreiter at Institute for Machine Learning, Linz. My research during my Phd revolved around long-term credit assignments in Reinforcement Learning and how we can build algorithms that build abstractions to learn faster and generalize to unknown parts of the environment.

Email / CV / Google Scholar / Twitter / Github

News/Updates

Paper accepted at ICML 2025
Paper accepted at COLLAS 2025
Joined Amazon's Team RUFUS!
Graduated my Phd with distinction!
New pre print on arxiv: A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
New pre print on arxiv: Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Paper accepted at the Conference in Lifelong Learning, 2024.
I got married to the lovely Yangchen Roy.
Paper accepted at the Generalisation in Planning workshop @ Neurips 2023.
Worked at Amazon, Seattle (September - December, 2023).
We just won the MyoChallenge @ Neurips2022. Joint work with Rahul Siripurapa, Luis Ferro and others at IARAI and JKU.
Paper accepted at the Deep RL workshop @ Neurips 2022.
Our paper on Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
Our paper on A Dataset Perspective on Offline Reinforcement Learning has been accepted at COLLAS 2022. (20th May 2022)
Our paper on History Compression via Language Models in Reinforcement Learning has been accepted at ICML 2022. (15th May 2022)
Our paper on Align-RUDDER: Learning from few Demonstrations by Reward Redistribution has been accepted at ICML 2022 for a *long presentation*. (<2%) (15th May 2022)
Our paper on A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization has been accepted at AISTATS, 2022.
Worked at Amazon as an Applied Science Intern. (Seattle, January - May 2022)

	A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks Thomas Schmied, Thomas Adler, Vihang Patil, Maximillian Beck, Korbinian Pöppel, Johannes Brandstetter, Gunter Klambauer, Razvan Pascanu, Sepp Hochreiter International Conference on Machine Learning (ICML), 2025 pdf We propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities. Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.
	Retrieval-Augmented Decision Transformer: External Memory for In-context RL Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter Under Review pdf We introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games.
	Contrastive Abstraction for Reinforcement Learning Vihang Patil, Markus Hofmarcher, Elisabeth Rumetshofer, Sepp Hochreiter Generalisation in Planning workshop @ Neurips, 2023 pdf We propose contrastive abstraction learning to find abstract states, where we assume that successive states in a trajectory belong to the same abstract state. Such abstract states may be basic locations, achieved subgoals, inventory, or health conditions. Contrastive abstraction learning first constructs clusters of state representations by contrastive learning and then applies modern Hopfield networks to determine the abstract states.
	Simplified Priors for Object-Centric Learning Vihang Patil, Andreas Radler, Daniel Klotz, Sepp Hochreiter COLLAS, 2024 We propose a non-iterative object-centric learning method using common building blocks, namely CNN, MaxPool and a modified Cross-Attention layer. A vastly modified version of this work is under review at Collas.
	MyoChallenge 2022: Learning contact-rich manipulation using a musculoskeletal hand Vittorio Caggiano, Guillaume Durandau, Huwawei Wang, Alberto Chiappa, Alexander Mathis, Pablo Tano, Nisheet Patel, Alexandre Pouget, Pierre Schumacher, Georg Martius, Daniel Haeufle, Yiran Geng, Boshi An, Yifan Zhong, Jiaming Ji, Yuanpei Chen, Hao Dong, Yaodong Yang, Rahul Siripurapu, Luis Eduardo Ferro Diez, Michael Kopp, Vihang Patil, Sepp Hochreiter, Rahul Siripurapu, Vihang Patil, Sepp Hochreiter, Yuval Tassa, Josh Merel, Randy Schultheis, Seungmoon Song, Massimo Sartori, Vikash Kumar Neural Information Processing Systems (NeurIPS), 2022 pdf In the MyoChallenge at the NeurIPS 2022 competition track, the task was to develop controllers for a realistic hand to solve a series of dexterous manipulation tasks. This work paper the challenge and its solutions. Our method was a co-winner of the Baoding Balls task.
	InfODist: Online distillation with Informative rewards improves generalization in Curriculum Learning Rahul Siripurapu, Vihang Patil,Kajetan Schweighofer, Marius-Constantin Dinu, Thomas Schmied, Luis Eduardo Ferro Diez, Markus Holzleitner, Hamid Eghbal-zadeh, Michael K Kopp, Sepp Hochreiter Deep Reinforcement Learning Workshop, Neurips, 2022 openreview A method to improve generalization in curriculum learning and an analysis of various factors affecting generalization.
	Align-RUDDER: Learning from Few Demonstrations by Reward Redistribution Vihang Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter International Conference on Machine Learning (ICML), 2022 blog / arXiv / video / code We present Align-RUDDER an algorithm which learns from as few as two demonstrations. It does this by aligning demonstrations and speeds up learning by reducing the delay in reward. (Long presentation at ICML, < 2%)
	History Compression via Language Models in Reinforcement Learning Fabian Paischer, Thomas Adler, Vihang Patil, Markus Holzleitner, Angela Bitto-Nemling, Sebastian Lehner, Hamid Eghbal-Zadeh, Sepp Hochreiter International Conference on Machine Learning (ICML), 2022 blog / arXiv / code HELM (History comprEssion via Language Models) is a novel framework for Reinforcement Learning (RL) in partially observable environments. Language is inherently well suited for abstraction and passing on experiences from one human to another. Therefore, we leverage a frozen pretrained language Transformer (PLT) to create abstract history representations for RL. (Spotlight presentation at ICML)
	Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning Christian Steinparz, Thomas Schmied, Fabian Paischer, Marius-Constantin Dinu, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter Conference on Lifelong Learning (COLLAS), 2022 arXiv / code We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly.
	A Dataset Perspective on Offline Reinforcement Learning Kajetan Schweighofer, Andreas Radler, Marius-Constantin Dinu, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Hamid Eghbal-Zadeh, Sepp Hochreiter Conference on Lifelong Learning (COLLAS), 2022 arxiv / code We conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments.
	A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning Youssef Diouane, Aurelien Lucchi, Vihang Patil* International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 arXiv In this work, we design a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints. (*equal contribution)
	Modern Hopfield Networks for Sample-Efficient Return Decomposition from Demonstrations Michael Widrich, Markus Hofmarcher, Vihang Patil, Angela Bitto-Nemling, Sepp Hochreiter Offline RL Workshop, Neurips, 2021 video We introduce modern Hopfield networks for return decomposition for delayed rewards (Hopfield-RUDDER). We experimentally show that Hopfield-RUDDER is able to outperform LSTM-based RUDDER on various 1D environments with small numbers of episodes.
	Guided Search for Maximum Entropy Reinforcement Learning Vihang Patil 2019 We propose a new convergent hybrid method which utilizes policy gradient directions to search in a smaller sub-space, called Guided Evolution Strategies with Sufficient Increase.

Modified from Jon Barron's website.