Memory Consolidation in Learning Robots

The goal of this project is to further develop memory-based RL into a more complex architecture for reinforcement learning agents that are able to solve complex tasks of a broad practical and industrial relevance. In this project, we are focusing mainly on two different aspects of such an architecture for memory-based RL:

  • More efficiently using the 'long term memory': we have developed off-line memory methods that reuse previous experiences to increase the effectiveness of the RL training. This can however be further improved by focusing on relevant information, disregarding not helpful experiences. Therefore we are developing methods for filtering and storing incoming state transition information during episodes of interaction with the environment.
  • New methods of task decomposition using a memory-based approach: we are studying the automatic generation of sub-goals by identifying 'important' states from the contents of a newly introduced 'short-term memory'.

learning system architecture

First, memorizing and reusing every single transition practically restricts the application of memory-based RL to problems that can be solved already after only a very limited amount of interactions. Memory efficiency and, in this case, the directly connected computing time of the offline-updates is particularly important for industrial applications. What is really needed here is a mechanism that draws the attention of the learner only to those parts of the continuous stream of state-action pairs that are in some sense noteworthy and potentially helpful in improving the strategy later. Once put into the long-term memory, there is the need for another independent mechanism that focuses the actual learning process during the consolidation phase exactly on those memorized transitions that promise the biggest improvement. Dependent on the present stage of development of the learner's strategy, the definition of 'the most useful transitions' in memory may change over time.

Second, we want to employ a sequential short-term memory in which recently reached sub-goals are stored, being potentially relevant for the success of the task at hand. Such a decomposition of an overall goal into easier to reach sub-goals becomes even more important when very heavily delayed reward or punishment becomes available only at the very end of an episode. Technically, automatically identifying states as good candidates for the formation of meaningful sub-goals still is an unsolved problem. We see several possibilities for transferring recent findings made by the other members of the consortium, concerning (landmark-based) navigation in animals, to this technical problem. One idea is implementing a saliency algorithm that selects only very few states from the continuous stream that are inherently noteworthy. Immediate reward RL and classical conditioning could then be used to establish an association between the correct salient stimuli (here: visual cues or more generally 'states') and positive outcome of the episode (see section 6). By adapting these mechanisms, we expect a significant breakthrough in the automatic formulation of sub-goals and their integration into the memory-based RL process as well as improvements in efficiency, flexibility and robustness over previous methods.

In correspondence with the experiments proposed by other members of the project, our research regarding selective attention and working memory methods is based on a navigation scenario in which an agent has to navigate to one or more given goal positions without depending on a pre-constructed map. This similarity between the technical and biological experiments helps us transferring problem statements and results concerning the utilizing of memory in such tasks from one group to the other.

The generality of the architecture makes it practically relevant not only for navigation tasks but for a wider variety of tasks ranging from low-level control engineering to high-level tasks such as learning to correctly assemble a manufacturing good in several individual operations.

People

Researchers working in our team:

  • Prof. Dr. Martin Riedmiller
  • Manuel Blum

Contact

For more information on this research project, please contact Martin Riedmiller.