MinkUNeXt: Point Cloud-based Large-scale Place Recognition using 3D Sparse Convolutions
J.J. Cabrera, A. Santo, A. Gil, C. Viegas and L. Payá
Array  (2025)
Ed. Elsevier  ISSN:2590-0056  DOI:https://doi.org/10.1016/j.array.2025.100569  BIBTEX:@article{cabrera2025minkunext, title = {MinkUNeXt: Point cloud-based large-scale place recognition using 3D sparse convolutions}, journal = {Array}, volume = {28}, pages = {100569}, year = {2025}, issn = {2590-0056}, doi = {https://doi.org/10.1016/j.array.2025.100569}, url = {https://www.sciencedirect.com/science/article/pii/S2590005625001961}, author = {Juan José Cabrera and Antonio Santo and Arturo Gil and Carlos Viegas and Luis Payá}, keywords = {Place recognition, LiDAR, Point cloud embedding, 3D sparse convolutions}, abstract = {This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder–decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar, the In-house, the KITTI and the USyd datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art. The implementation is publicly available at https://juanjo-cabrera.github.io/projects-MinkUNeXt/.} }  - 28, 100569

Resumen:

This paper presents MinkUNeXt, an effective and efficient architecture for place-recognition from point clouds entirely based on the new 3D MinkNeXt Block, a residual block composed of 3D sparse convolutions that follows the philosophy established by recent Transformers but purely using simple 3D convolutions. Feature extraction is performed at different scales by a U-Net encoder–decoder network and the feature aggregation of those features into a single descriptor is carried out by a Generalized Mean Pooling (GeM). The proposed architecture demonstrates that it is possible to surpass the current state-of-the-art by only relying on conventional 3D sparse convolutions without making use of more complex and sophisticated proposals such as Transformers, Attention-Layers or Deformable Convolutions. A thorough assessment of the proposal has been carried out using the Oxford RobotCar, the In-house, the KITTI and the USyd datasets. As a result, MinkUNeXt proves to outperform other methods in the state-of-the-art. The implementation is publicly available at https://juanjo-cabrera.github.io/projects-MinkUNeXt/