Methods to describe the global appearance of scenes: an application to map building and mobile robots localization
Dr. Luis Payá Castelló

In recent years, the applications of mobile robots have increased considerably and nowadays, they can be found in many different areas. When a robot has to perform a task autonomously, in an unknown environment, it must be able to build an internal representation or map. This representation should contain enough information for the robot to be able to estimate its position and orientation and to calculate the trajectory to arrive to the target points.


We can provide the robots with different sensors that enable them to extract the necessary information from the environment, to carry out the mapping and localization tasks. In this PhD Thesis, we make use of omnidirectional vision sensors, due to the large amount of information they provide us with and their relatively low cost. Map building and localization using omnidirectional vision are two fields of study that currently attract numerous researches. However, we cannot find closed solutions to solve these problems with robustness in large environments where the robot usually operates during very long periods of time, and some significant changes in the appearance of the captured scenes may occur.


There are two main methods to extract the necessary information from the scenes to create a map or model of the environment and to estimate the position of the robot. The first one is based on the extraction of landmarks or regions from the images and its description by means of any method which is robust to changes in perspective. These extraction and description methods have reached a relative maturity and many current popular mapping and localization algorithms are based on them. However, they present some drawbacks such as a high computational cost when dealing with large maps, a relatively low robustness to changes in the environment and some inability to extract distinctive features from unstructured environments. The second method consists in working with the information from each scene as a whole, without extracting any local feature. An only descriptor per scene is built and it gathers some global information from the image. It is a more recent approach, which leads to conceptually simpler algorithms. However, there are some features concerning its application to mapping and location that have to be studied in depth. The most relevant contributions of this PhD Thesis focus on this area of study.


When designing a framework to solve the mapping and localization problems, we can choose solving them either metrically or topologically. A metric map contains some information about the position of certain features in the environment with respect to a reference system with an associated uncertainty. Using such maps, the robot is able to estimate its position with geometric precision. Landmark extraction techniques, combined with probabilistic algorithms allow us to create this kind of maps. On the other hand, topological maps are a representation of the world which contain only several locations and the connectivity relationships among them. Such maps are an efficient mechanism to locate the robot often with enough precision and plan its trajectories.


The techniques based on the global appearance of the scenes offer us an alternative to implement this type of topological representations. Throughout the document we will show how the visual appearance descriptors permit building this kind of representations in a natural way. Most of the contributions of this PhD Thesis fall within this area. We will study several methods to globally describe scenes and their performance in map building and localization tasks. Then, we will develop new algorithms computationally more efficient to describe the scenes and we will test their robustness against some typical events such as noise, changes in the lighting conditions and partial occlusions in the scenes. After that, we will analyse the kind of information to be stored in the maps and the relationships between captured locations to build functional hierarchical maps. Finally, we will estimate the position and orientation of the robot within these maps. All the algorithms proposed have been tested using several sets of images, some of them captured by ourselves and others by a third party. These datasets contain heterogeneous information, captured by omnidirectional vision systems with different geometries and at different times of day and year, to reflect the variability that visual information can suffer in real applications.