SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models

CVPR 2025

Wufei Ma1Luoxin Ye1
Celso M de Melo2Alan Yuille1Jieneng Chen1

1Johns Hopkins University2DEVCOM Army Research Laboratory

We systematically study the impact of 3D-informed data, architecture, and training setups and present SpatialLLM, an LMM with advanced 3D spatial reasoning abilities.

ImageNet3D overview
Figure 1. Overview of our SpatialLLM framework.

BibTeX

@inproceedings{ma2025spatialllm,
  title={SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models},
  author={Ma, Wufei and Ye, Luoxin and de Melo, Celso and Yuille, Alan L and Chen, Jieneng},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2025}
}

Notes

This website template is adapted from Image Sculpting.