Instituto de Investigación en Ingeniería de Elche (I3E)

CrossPlace: Cross-Modal Place Recognition between Fisheye Cameras and LiDAR via a Unified Descriptor Space
Juan José Cabrera, Marcos Alfaro, María Flores, Álvaro Martínez, Arturo Gil, Luis Payá
Expert Systems with Applications (2026)
Ed. Elsevier ISSN:0957-4174 DOI:https://doi.org/10.1016/j.eswa.2026.132838 BIBTEX:@article{cabrera2026crossplace, title = {CrossPlace: Cross-Modal Place Recognition between Fisheye Cameras and LiDAR via a Unified Descriptor Space}, journal = {Expert Systems with Applications}, pages = {132838}, year = {2026}, issn = {0957-4174}, doi = {https://doi.org/10.1016/j.eswa.2026.132838}, author = {Juan José Cabrera and Marcos Alfaro and María Flores and Álvaro Martínez and Arturo Gil and Luis Payá}, } - Num. 132838

Resumen:

This paper presents CrossPlace, an innovative method for cross-modal place recognition between heterogeneous sensor modalities, particularly between fisheye cameras and LiDAR. Place recognition is the fundamental capability of mobile robots to determine their most likely location within a database, based on sensory input queries. In cross-modal place recognition, the goal is to localize using a different sensor from the one originally used to construct the database. The core contribution of this paper is a unified feature space that integrates intensity, depth and semantic information. Both the database entries and the queries are obtained by embedding sensor readings through the same CrossPlace model, ensuring a consistent representation across modalities. Consequently, a database constructed from LiDAR can be queried with fisheye images, and vice versa, using a single shared architecture. Furthermore, a comprehensive data transformation and preprocessing pipeline is presented. Specifically, CrossPlace is constituted by three independent branches, each one for processing intensity, depth and semantic information. Each branch consists of a CosPlace model for image embedding with shared weights across sensor modalities. Late fusion through concatenation of the intensity, depth and semantic embeddings provides optimal global performance. We conduct an exhaustive evaluation on the KITTI-360 dataset, where CrossPlace surpasses state-of-the-art techniques across all metrics, establishing a new standard for cross-modal place recognition in urban and highway environments. The results demonstrate the effectiveness of our unified approach for place recognition across different sensor modalities while maintaining a robust performance under various operating environments.