Home Research BCCN I Project Details Goal-directed Learning (learning to achieve a goal)
Personal tools

Goal-directed Learning (learning to achieve a goal)

Computational Neuroscience

It is fair to say that currently all algorithms used to learn goal directed behaviour rely on the paradigm of reinforcement learning where a (numerical) reward is being accumulated through successful actions of the agent. While of great practical importance, this paradigm bears one fundamental problem: If not for evolutionary mechanisms, every reinforcement will always have originated from an external agent (the reinforcer, e.g. a teacher, observer, robot-designer) who provides evaluative feedback to the agent. To make the reinforcement ultimately efficient the external reinforcer would need full knowledge of the to-be-reinforced agent and its world. Even in simple real-world situations this is not possible (e.g. simply due to noise). Hence, if little world-knowledge is used, this may lead to a mismatch between induced reinforcement and actually obtainable behaviour of the agent impairing or destroying convergence of learning. One the other hand, if much world-knowledge is used, learning might either become obsolete (why not just program the agent…) or – at best – the agent becomes a slave of the all-knowing designer. While much simplified this picture paints the bias-variance dilemma of reinforcement learning. The only possible alternative would be to let the agent decide purely by its own means and intentions how and what to learn. And the only feedback that such an agent gets would have to be a strictly non-evaluative feedback provided by the environment. For learning, such an agent can only use correlations between signals that arrive from the world at its sensors. Such correlations, however, are normally not directed. Hence they do not point to a goal.

Hence the conundrum that this project seeks to resolve is how to use non-evaluative, non-goal directed signals from the environment together with pure correlation based learning mechanisms to indeed learn goal directed behaviour.

Currently we are using these algorithms for two sub-projects

1) Correlation based learning in particular three factor learning (ISO 3) provides the base for the adaptive goal-directed structures. The neural systems are embedded into a three-joint arm both in simulation and robotic device. The left movie shows a one-joint arm simulated with ODE. The arm learns to reach the red dot after a few (~5) trials. On the right the preliminary hardware arm is shown. It will be controlled by spring-like muscles.

Goal directed muscle  Robotarm_hardware

2) The second part of this project focuses on the Hippocampal Place Field System which represents a natural sequencing system (along the rat’s path) and could be used to learn navigating towards a goal.

Experiments on rats show that that visual cues play an important role in the formation of place cells. Nevertheless, rats also rely on other allothetic non-visual stimuli such as auditory, olfactory and somatosensory stimuli. Most researches have seen navigation in the dark as evidence for the importance of path integration as an additional input to place cells. However, Save et all (2000) have shown that olfactory information rather than self-motion information has been used to stabilize the place fields (PF) of rats in the dark. It has also been observed that PF representation density varies in rat hippocampus depending on a goal location (Hollup at al, 2001). We address these findings by modeling olfactory information-supported varying density place cells and using them for goal navigation in a closed loop behavioral scenario. In a model we develop PFs in the Entorhinal Cortex (EC) from external (visual and olfactory) cues as well as self generated (urine marking) cues (panel A). In the Dentate Gyrus (DG) exploration dependent PFs are obtained through Hebbian learning where denser representations are created at more frequently visited locations. Obtained PFs are used for goal navigation by ways of the Q-learning algorithm. Sensory inputs as well as place cells are affected whenever the rat navigates in the environment, thus closing the loop. We use a fully connected feed-forward network (panel B) to create place cells where initially random connection weights W are used. Features X derived from visual and olfactory cues are fed to the input layer. An example of PFs is shown in panel C and we observed that slightly less directional cells were obtained by using slow learning in comparison to the place fields obtained without learning. We have also obtained that use of olfactory information reduces the number of directional cells.

Place fields

Computational Neuroscience

Main cooperation partners:

Belongs to Group(s):