Automation, Robotics and Computer Vision Laboratory (ARVC)

Reconocimiento de lugares en entornos de exterior e interior mediante técnicas de aprendizaje profundo e información multisensorial
Dr. Juan José Cabrera Mora

This thesis addresses the problem of place recognition in mobile robotics, a fundamental task for localization, autonomous navigation and mapping in complex and dynamic environments. An integrated approach is proposed that explores and develops robust and efficient methods based on different sensory modalities: omnidirectional cameras, LiDAR, pseudo-LiDAR and cross-modal place recognition between cameras and LiDAR.

First, visual place recognition techniques using panoramic images captured by omnidirectional cameras are studied. Two approaches are presented and analyzed: a hierarchical method based on room classification followed by a fine position estimation and a global method based on Siamese neural networks and contrastive learning. The importance of data augmentation techniques specific to panoramic images is demonstrated, improving robustness against illumination variations under real operating conditions.

Subsequently, MinkUNeXt is introduced, a new neural network architecture based on sparse 3D convolutions, optimized for place recognition from LiDAR point clouds. This architecture, together with the MinkNeXt 3D residual block, sets a new milestone in the state of the art, validated on benchmark datasets such as Oxford RobotCar and In-house.

The thesis also explores the use of pseudo-LiDAR techniques in the context of visual place recognition. The proposed technique generates synthetic point clouds from panoramic images using advanced depth estimators. The Distilled Depth Variations data augmentation technique is proposed to simulate the inaccuracies in depth estimation by combining different estimators to generate the training data for the place recognition model. In this way, the model is more robust to depth inconsistencies caused by illumination changes. The results show that robust recognition can be achieved using only visual information, reducing costs and sensory complexity.

Finally, place recognition between different sensor modalities is addressed by proposing CrossPlace, a method that transforms both 360º images captured by omnidirectional fisheye cameras and LiDAR scans into a common space of intensity, depth and semantic information. This allows the use of a single network architecture for both sensor modalities, avoiding the need to recapture databases and facilitating interoperability between heterogeneous robotic platforms. Experiments on the KITTI-360 dataset demonstrate that the proposed approach outperforms existing methods in both urban and highway scenarios.

Overall, this thesis introduces novel architectures, data augmentation techniques, and sensor fusion strategies, setting new benchmarks in place recognition and paving the way for more autonomous, flexible, and adaptable robotic systems in real-world environments.