SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning

NeurIPS 2025

Wufei Ma*Yu-Cheng Chou*Qihao Liu*Xingrui Wang
Celso de MeloJianwen XieoAlan Yuille

Johns Hopkins University
DEVCOM Research LaboratoryoLambda Inc
*Equal first author

we introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages -- 3D perception, computation, and reasoning. Explicit 3D representations provide a coherent interface that supports advanced 3D spatial reasoning and enable us to study the factual errors made by LVLMs.

Motivation
Figure 1. Comparing 3D spatial reasoning of our SpatialReasoner with previous state-of-the-art models.

Key Design I: Explicit Spatial Reasoning

design 1
Figure 2. Comparing 3D spatial reasoning of our SpatialReasoner with previous state-of-the-art models.

Key Design II: Generalizable Spatial Reasoning

design 2
Figure 2. Comparing 3D spatial reasoning of our SpatialReasoner with previous state-of-the-art models.

Key Findings

SpatialReasoner models. We experimented on a variety of SpatialReasoner models with different data and training methods.

Evaluation on 3DSRBench .

model 3DSRBench
Overall Height Location Orientation Multi-Object
Open-sourced
Qwen2.5-VL 48.4 44.1 62.7 40.6 40.5
SpatialLLM 44.8 45.8 61.6 30.0 36.7
SpatialRGPT 32.7 55.9 39.0 27.8 20.0
SpatialRGPT+Depth 48.4 55.9 60.0 34.2 42.3
Propietary
GPT-4o 44.2 53.2 59.6 21.6 39.0
Gemini 2.0 51.1 53.0 67.1 35.8 43.6
QwenVLMax 52.0 45.1 70.7 37.7 44.8
Ours
SpatialReasoner-Zero 54.0 46.4 67.3 48.4 47.2
SpatialReasoner-SFT 58.3 51.9 73.5 50.7 50.3
SpatialReasoner 60.3 52.5 75.2 55.2 51.8
Table 1. Evaluation on 3DSRBench . Our SpatialReasoner outperforms all prior open-sourced and propietary models, including GPT-4o, Gemini 2.0, and QwenVLMax.

Evaluation on 3DSRBench .

model in-distribution out-of-distribution
overall height location orientation multi-object
reference
Qwen2.5-VL 48.4 44.1 62.7 40.6 40.5
ours
SpatialReasoner-Zero 53.7 40.6 68.4 50.2 46.6
SpatialReasoner-SFT 52.2 44.9 69.5 48.9 40.0
SpatialReasoner 56.4 52.5 72.6 54.1 43.4

Qualitative Comparisons

Open Source

Static Badge

Static Badge

Static Badge

Miscellaneous

License. Our SpatialReasoner and SpatialReasonerDataGen is released under the Creative Commons Attribution 4.0 license. By accessing and using our SpatialReasoner and SpatialReasonerDataGen, you agree to follow the terms of access specified here.

BibTeX

@inproceedings{ma2025spatialreasoner,
  title={SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning},
  author={Ma, Wufei and Chou, Yu-Cheng and Liu, Qihao and Wang, Xingrui and Melo, Celso M de and Xie, Jianwen and Yuille, Alan},
  booktitle={Advances in Neural Information Processing Systems},
  volume={38},
  year={2025},
}

Notes

This website template is adapted from Image Sculpting.