Wufei Ma*
Yu-Cheng Chou*
Qihao Liu*
Xingrui Wang
Celso de Melo†
Jieneng Chen
Jianwen Xieo
Alan Yuille
Johns Hopkins University
†DEVCOM Army Research Laboratory
oLambda Inc
*Equal contribution
we introduce SpatialReasoner, a novel large vision-language model (LVLM) that address 3D spatial reasoning with explicit 3D representations shared between stages -- 3D perception, computation, and reasoning. Explicit 3D representations provide a coherent interface that supports advanced 3D spatial reasoning and enable us to study the factual errors made by LVLMs.
Coming soon.
SpatialReasoner codebase.
Synthetic 3D data generation pipeline.
SpatialReasoner models.
SpatialReasoner data.
License. Our SpatialReasoner and SpatialReasonerDataGen is released under the Creative Commons Attribution 4.0 license. By accessing and using our SpatialReasoner and SpatialReasonerDataGen, you agree to follow the terms of access specified here.
Ethics. We follow the ethics guidelines at Johns Hopkins University and obtained Institutional Review Board (IRB) approvals prior to the start of our work. We described potential risks to the annotators and explained the purpose of the study and how the collected data would be used. All annotators agreed to join this project voluntarily and were paid by a fair amount as required at our institution.
pending
This website template is adapted from Image Sculpting.