AI Synergies


Denis Steckelmacher

PhD student

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

We present Bootstrapped Dual Policy Iteration (BDPI), an “actor-critic” reinforcement learning algorithm designed to achieve very high sample-efficiency and exploration quality. Contrary to conventional actor-critic algorithms, BDPI’s actor is robust to off-policy critics, which allows state-of-the-art critics to be used. Such good critics, combined with a good actor, lead to BDPI’s high sample-efficiency.


PhD student specialized in sample-efficient Reinforcement Learning and applications of Reinforcement Learning to real-world tasks. Denis has a master in Computer Science (option Artificial Intelligence), and has started his PhD in Reinforcement Learning in 2016.


Halle aux Fûts
Day 3 - Nov 8th

Brewery of Ideas

AI Synergies is organized by VUB/ULB, BNVKI and Brewery of Ideas.

More info about our events