Task Space Exploration in Robot Reinforcement Learning

Master's Thesis supervised by Prof. Jan Peters, Ph.D. at Intelligent Autonomous Systems Group, TU Darmstadt, 2023


Exploration is a key ingredient of Reinforcement Learning (RL). Most common RL algorithms for robot RL use Gaussian action noise to explore new behaviors, which does not make use of prior knowledge about the robot or the environment. This thesis studies how task space methods from classical robotics can improve exploration for robot RL. We approach the task space exploration problem from a local and a global perspective.
From the local perspective, we establish a maximum entropy formulation from which we derive an optimal sampling strategy. We define a new framework of Jacobian methods that generalizes the well-known Jacobian inverse and Jacobian transpose method, and we interpret the optimal sampling strategy with the aid of the new framework. Based on the theoretical findings, we develop Task-Space-Aware Action Sampling (TAAS). Our empirical studies find that TAAS explores more efficiency and can increase the success rate of RL compared to common joint space exploration.
To improve global task space exploration directly, we couple the standard local action policy with an actively exploring global bias policy. By deriving an adapted surrogate loss function from the performance difference lemma, we integrate our approach, Task Space Biasing (TSB), into the state-of-the-art RL algorithm Proximal Policy Optimization (PPO). We compare TSB with pure PPO and Residual Reinforcement Learning (RRL) on a complex reaching task with sparse rewards and find that it outperforms both baselines by a large margin. We argue that the presented analysis and developed methods enrich the possibilities of integrating prior knowledge into robot RL and, hence, have the potential of rendering more challenging robotics problems efficiently solvable in the future.