Brainstormers Tribots self learning

Autonomous robots have to solve complex control task. To cover all situations with which they might be faces in a dynamically changing environment, they need to be programmed manually or their behavior must be teached in by human developpers. However, the quality of the control strategy highly depends on the ability of the human instructor to analyze the task appropriately and to instruct the robot adequately.

To avoid the large amount of human expertise that is needed to program a robot and to achieve optimal behaviors of a robot it is promising to use self-adapting and machine learning techniques like evolutionary algorithms and reinforcement learning. These approaches are based on the idea to make the robot learn itself what it has to do by collecting experience and evaluting all trials of the robot with respect to the degree of achieving the desired behavior or goal. Analyzing its own experience, the robot is enabled to optimize its own control strategy without the help of a human instructor.

The Tribots' Approach

Due to the major research focus of the Neuroinformatics group, the Brainstormers Tribots use the reinforcement learning framework to optimize the robot behaviors. Already some years before the Tribots were founded, reinforcement learning approaches have been applied in the Brainstormers2D simulation league team. Starting from low level skills like hard kicking and running to a position reinforcement learning was applied also to higher level strategy and multi agent tasks like learning the defense and offence strategy of the team.

While the conditions in the simulation league are optimal to apply reinforcement learning directly, the real robots of our MiddleSizeLeague team required a lot of development before the prerequisites for learning were fulfilled. Among others, open questions have been:

  • how can we create compact state descriptions?
  • how can we deal with the non-Markovian environment?
  • how can we learn on the real robots which only allows a very limited number of trials to acquire the training examples?
  • how can we represent the value-functions of real-valued state spaces?

Starting in 2003, our first approaches to self-learning robots were based on the idea to learn the robot behavior in a simulator where we can repeat trials many times without the physical limitations of a real robot. As function approximators we used multi-layer perceptrons and lattice maps. First applications were the task to drive to a given position as quick as possible and to intercept a rolling ball. The control strategy learned in the latter task was integrated in the competition code of our team and was applied at the World Cup 2006 for the first time during a tournament on real robots.

After inventing a bunch of techniques to solve the above mentioned problems, especially the NFQ learning algorithm, we were able to apply reinforcement learning on the real robots. Until the World Cup 2007 we replaced the manually coded dribbling behavior by a learned behavior. Learning was done on the real robot within 20 minutes plus three hours offline computation. The behavior learned was used during the tournament and it was also matter of our research challenge at RoboCup 2007.

An additinal field in which we started to apply reinforcement learning techniques is the area of direct motor control. We replace the standard PID motor controllers which control the voltage at the motors by controllers which are learned. Again, we use NFQ as learning algorithm combined with multi layer perceptrons as function approximators.