Apprenticeship Learning for Helicopter Control

Apprenticeship Learning for Helicopter Control
Abstract

In apprenticeship learning, we assume that an expert is available who is capable of performing the desired maneuvers. We then leverage these demonstrations to learn all of the necessary components for our control system. In particular, the demonstrations allow us to learn a model of the helicopter dynamics, as well as appropriate choices of target trajectories and reward parameters for input into a reinforcement learning or optimal control algorithm. The remainder of this paper is organized as follows: Section 2 briefly overviews related work in the robotics literature that is similar in spirit to our approach. Section 3 describes our basic modeling approach, where we develop a model of the helicopter dynamics from data collected under human control, and subsequently improve this model using data from autonomous flights. Section 4 presents an apprenticeship-based trajectory learning algorithm that learns idealized trajectories of the maneuvers we wish to fly. This algorithm also provides a mechanism for improving our model of the helicopter dynamics along the desired trajectory. Section 5 describes our control algorithm, which is based on differential dynamic programming (DDP).15 Section 6 describes our helicopter platform and presents our experimental results. 2. ReLateD WoRK Although no prior works span our entire setting of apprenticeship learning for control, there are separate pieces of work that relate to various components of our approach. Atkeson and Schaal,8 for instance, use multiple demonstrations to learn a model for a robot arm, and then find an optimal controller in their simulator, initializing their optimal control algorithm with one of the demonstrations. The work of Calinon et al.11 considered learning trajectories and constraints from demonstrations for robotic tasks. There, however, they do not consider the system’s dynamics or provide a clear mechanism for the inclusion of prior knowledge, which will be a key component of our approach as detailed in Section 4. Our formulation will present a principled, joint optimization which takes into account the multiple demonstrations, as well as the (complex) system dynamics. Among others, An et al.6 and Abbeel et al.5 have exploited the idea of trajectory-specific model learning for control.