Home Research Project Details Goal-directed Learning (learning to achieve a goal)
Personal tools

Goal-directed Learning (learning to achieve a goal)

Contact person:  Kolodziejski, Christoph 
Computational Neuroscience

It is fair to say that currently all algorithms used to learn goal directed behaviour rely on the paradigm of reinforcement learning where a (numerical) reward is being accumulated through successful actions of the agent. While of great practical importance, this paradigm bears one fundamental problem: If not for evolutionary mechanisms, every reinforcement will always have originated from an external agent (the reinforcer, e.g. a teacher, observer, robot-designer) who provides evaluative feedback to the agent. To make the reinforcement ultimately efficient the external reinforcer would need full knowledge of the to-be-reinforced agent and its world. Even in simple real-world situations this is not possible (e.g. simply due to noise). Hence, if little world-knowledge is used, this may lead to a mismatch between induced reinforcement and actually obtainable behaviour of the agent impairing or destroying convergence of learning. One the other hand, if much world-knowledge is used, learning might either become obsolete (why not just program the agent…) or – at best – the agent becomes a slave of the all-knowing designer. While much simplified this picture paints the bias-variance dilemma of reinforcement learning. The only possible alternative would be to let the agent decide purely by its own means and intentions how and what to learn. And the only feedback that such an agent gets would have to be a strictly non-evaluative feedback provided by the environment. For learning, such an agent can only use correlations between signals that arrive from the world at its sensors. Such correlations, however, are normally not directed. Hence they do not point to a goal.

Hence the conundrum that this project seeks to resolve is how to use non-evaluative, non-goal directed signals from the environment together with pure correlation based learning mechanisms to indeed learn goal directed behaviour.


Currently we are using these algorithms for two sub-projects

1) Correlation based learning in particular three factor learning (ISO 3) provides the base for the adaptive goal-directed structures. The neural systems are embedded into a three-joint arm both in simulation and robotic device. The left movie shows a one-joint arm simulated with ODE. The arm learns to reach the red dot after a few (~5) trials. On the right the preliminary hardware arm is shown. It will be controlled by spring-like muscles.

Goal directed muscle  Robotarm_hardware

2) The second part of this project focuses on the Hippocampal Place Field System which represents a natural sequencing system (along the rat’s path) and could be used to learn navigating towards a goal.

Experiments on rats show that that visual cues play an important role in the formation of place cells. Nevertheless, rats also rely on other allothetic non-visual stimuli such as auditory, olfactory and somatosensory stimuli. Most researches have seen navigation in the dark as evidence for the importance of path integration as an additional input to place cells. However, Save et all (2000) have shown that olfactory information rather than self-motion information has been used to stabilize the place fields (PF) of rats in the dark. It has also been observed that PF representation density varies in rat hippocampus depending on a goal location (Hollup at al, 2001). We address these findings by modeling olfactory information-supported varying density place cells and using them for goal navigation in a closed loop behavioral scenario. In a model we develop PFs in the Entorhinal Cortex (EC) from external (visual and olfactory) cues as well as self generated (urine marking) cues (panel A). In the Dentate Gyrus (DG) exploration dependent PFs are obtained through Hebbian learning where denser representations are created at more frequently visited locations. Obtained PFs are used for goal navigation by ways of the Q-learning algorithm. Sensory inputs as well as place cells are affected whenever the rat navigates in the environment, thus closing the loop. We use a fully connected feed-forward network (panel B) to create place cells where initially random connection weights W are used. Features X derived from visual and olfactory cues are fed to the input layer. An example of PFs is shown in panel C and we observed that slightly less directional cells were obtained by using slow learning in comparison to the place fields obtained without learning. We have also obtained that use of olfactory information reduces the number of directional cells.

Place fields

Computational Neuroscience

Main cooperation partners:



Belongs to Group(s):
Computational Neuroscience

Members working within this Project:
Kolodziejski, Christoph 

Selected Publication(s):

Kolodziejski, C, Porr, B, and Wörgötter, F (2008).
On the asymptotic equivalence between differential Hebbian and temporal difference learning
Neural Computation in press.

Kolodziejski, C, Porr, B, and Wörgötter, F (2008).
On the equivalence between differential Hebbian and temporal difference learning
In: . Proceedings of the Computational and Systems Neuroscience meeting COSYNE*2008, Salt Lake City.

Kolodziejski, C, Porr, B, and Wörgötter, F (2008).
Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison
Biological Cybernetics 98(3):259-272.

Kolodziejski, C, Porr, B, Tamosiunaite, M, and Wörgötter, F (2008).
On the equivalence between TD-learning and differential Hebbian learning using a local third factor
In: Advances in Neural Information Processing Systems 21. MIT Press, pages in press.

Tamosiunaite, M, Ainge, J, Kulvicius, T, Porr, B, Dudchenko, P, and Wörgötter, F (2008).
Path-finding in real and simulated rats: On the usefulness of forgetting and frustration for navigation learning
J. Comp. Nsci. submitted. download file

Thompson, AM, Porr, B, Kolodziejski, C, and Wörgötter, F (2008).
Second Order Conditioning in the Sub-cortical Nuclei of the Limbic System
In: From Animals to Animats 10. Springer Berlin / Heidelberg, pages 189-198.

Kolodziejski, C, Porr, B, and Wörgötter, F (2007).
Anticipative adaptive muscle control: Forward modeling with self-induced disturbances and recruitment
In: . In: . Proceedings of the sixteenth annual computational neuroscience meeting CNS*2007, Toronto. download file

Kulvicius, T, Tamosiunaite, M, and Wörgötter, F (2007).
Development of place cells by a simple model in a closed loop context
16thAnnual Computational Neuroscience meeting (CNS) (Toronto):in press.

Porr, B, and Wörgötter, F (2007).
Learning with Relevance: Using a third factor to stabilize Hebbian learning.
Neural Comp in press:,. download file

Wörgötter, F, and Porr, B (2007).
Reinforcement Learning
Scholarpedia http://www.scholarpedia.org/article/Reinforcement_Learning.

Kolodziejski, C, Porr, B, and Wörgötter, F (2006).
Fast, flexible and adaptive motor control achieved by pairing neuronal learning with recruitment
In: . Proceedings of the fifteenth annual computational neuroscience meeting CNS*2006, Edinburgh. download file

Porr, B, and Wörgötter, F (2006).
Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only
Neural Comp. 18(6):1380-1412. download file

Thompson, MA, Porr, B, and Wörgötter, F (2006).
Stabilising Hebbian learning with a third factor in a food retrieval task
SAB, Rome in press. download file

Wörgötter, F, Kolodziejski, C, and Porr, B (2006).
Comparing neuronal approaches for temporal sequence learning
Natural Computing (in press).