The courses

Neural Mechanisms of Visual Object Recognition and Categorization.

We will review the functional anatomy of the primate visual system emphasizing the ventral visual stream which is involved in the coding of object properties. Then we will discuss the responses of single neurons in the various ventral visual areas using a computational framework that distinguishes between the two essential problems of object recognition: invariance for image transformations (position, size, illumination and viewpoint) and selectivity for object properties. We will discuss experimental findings related to categorization of visual images and the effect of categorization learning on the representation in visual areas as well as in non-visual areas such as prefrontal cortex. Finally, we will discuss the coding of dynamic images of visual actions by single neurons in the prefrontal, parietal and visual cortex.


Visual Object Recognition

Visual object recognition research has made considerable progress in recent years, to an extent that computer vision algorithms are gradually becoming applicable to challenging real-world recognition tasks. Many of those advances have come from a better understanding of local features that can be robustly extracted and matched under the difficult conditions encountered in such settings, including viewpoint and illumination changes, clutter, and partial occlusion.

This first part of the recognition tutorial will therefore focus on local features and how they can be used for recognition, both for specific objects and for object categories. We will introduce the concepts behind state-of-the-art interest point detectors and local region descriptors and will discuss several concrete implementations. We will then describe spatial models that can be used for recognizing familiar objects. Generalizing from specific objects to entire visual object categories, we will show how those models can be extended to cover the variability in both appearance and spatial layout. Finally, we will demonstrate how those concepts are applied in state-of-the-art object detection systems and discuss ways how those systems can be extended to additional dimensions of variability, such as scale changes and image-plane rotations.


Image Matching and Camera Tracking

Image matching and camera tracking is a useful tools for self localization, scene modeling and recognition.

The state of the art paradigm for image matching and camera tracking combines feature detection and selection, robust statistics,
optimization and algebraic geometry to find corresponding points in images and to recover motion of the camera in space. We will explain
the paradigm and its main components.

First, the state of the art image matching based on affine covariant feature detectors and descriptors will be presented. We will build on
the previous course "Visual Object Recognition Lecturers" by Bastian Leibe, Tinne Tuytelaars, Bernt Schiele and Ales Leonardis. Secondly,
camera models and their estimation from minimal number of points will be explained. We will show principles of constructing appropriate
models to minimize the number of points needed to estimate them. We will provide an informal introduction to algebraic geometry necessary
to understand basics of the problem. Finally, we will explain robust estimation techniques based on Random Sampling Consensus in general
and its variations useful for camera tracking.


Spatial Sound Processing

Spatial information of a sound field is captured by recording it with several receivers, such as the two ears of the human auditory system or the several microphones found in modern hearing aids. Subsequent processing permits to extract several parameters of the ambient acoustics, e.g., an estimate of the number of sound sources present, their positions relative to the listener, and even the direction where other speakers are facing. Signal enhancement techniques are also routinely based on this spatial information, enabling us to enhance desired signal components and suppressing interfering (noise) sources.
This tutorial will give an overview on the fundamentals and applications of perception and processing of spatial sound. We will outline the physics of sound
field generation and the physiology and psychophysics of how our hearing system perceives spatial patterns. Technical approaches to analyzing and filtering spatial sound data will be outlined, including principles of microphone array beam-forming and recent approaches in the field of independent component analysis of sound signals.


Speech Communication by Humans and by Machine

Spectral analysis of sounds is one of undisputed elements of early auditory processing. Spectrograph, introduced to a general scientific public after the Second World War, was developed to emulate this elementary capability and had significant and lasting effect on our view of acoustic world and especially on speech engineering. However, the understanding of processing of sounds in biological systems advanced considerably since the day of the Spectrograph. The talk will discuss some speech processing techniques that are based on evolving understanding of the role of spectrally localized dynamic temporal cues in human auditory perception.


Logical Representational and Computational Methods for Markov Decision Processes

Markov decision processes (MDPs) have become standard models for sequential decision problems involving uncertainty within the planning and probabilistic reasoning communities. This tutorial will provide a brief introduction to Markov decision processes and survey some of the recent advances that have been made in the concise and natural representation of MDPs using logical techniques; and computational methods that exploit this logical structure. Representations such as dynamic Bayesian networks, BDDs, and the stochastic situation calculus will be discussed.


Autonomous Robot Learning of Foundational Representations

An intelligent agent experiences the world through low-level sensory and motor interfaces (the "pixel level"). However, in order to function intelligently, it must be able to describe its world in terms of higher-level concepts such as places, paths, objects, actions, other agents, their beliefs, goals, plans, and so on. How can these higher-level concepts that make up the foundation of commonsense knowledge be learned from unguided experience at the pixel level?

This question is important in practical terms: As robots are developed with increasingly complex sensory and motor systems, it becomes impractical for human engineers to implement their high-level concepts and define how those concepts are grounded in sensorimotor interaction. The same question is also important in theory: Does AI depend necessarily on human programming, or can the concepts at the foundation of intelligence be learned from unguided experience? This tutorial will describe recent progress on these questions, including the learning methods that support them.


Developmental Algorithms

Have you ever thrown sticks and stones in the water as a child, just to find out whether they would float or not? Or have you ever noticed how much fun babies can have by simply touching objects, sticking them into their mouths, or rattling them and discovering new noises? It is these embodied interactions, experiences and discoveries and not only the organization of our brain that together result in intelligence. During the past five years, we have been working on algorithms that make robots eager to investigate their surroundings. These robots explore their environment in search of new things to learn: they get bored with situations that are already familiar to them, and also avoid situations which are too difficult. In our experiments, we place the robots in a world that is rich in learning opportunities and then just watch how the robots develop by themselves. The results show relevant analogies with the ways in which young children discover their own bodies as well as the people and objects that are close to them.


Cognitive Architectures

The goal of the tutorial is to give students a brief overview of past and ongoing research in cognitive architectures. The expected outcome is an appreciation of the utility of cognitive architectures for formulating theoretical principles underlying cognition and for building computational models and artificial cognitive systems using cognitive architectures (e.g., to test these principles of cognition by replicating and explaining human performance on cognitive tasks). The course will start by looking at the nature, role, and utility of building computational models of cognitive functions. It will then introduce the main cognitive architectures (including ACT-R, SOAR, and EPIC), while also briefly reviewing some non-symbolic architectures (e.g., like LEABRA).