Post
1946
This is supercool!!
LlaVA-3D: adds 3D-awareness to LVMs without compromising 2D understanding capabilities.
Method: they developed a unified architecture that maps 2D clip patch features to their corresponding positions in 3D space - enabling joint 2D and 3D vision-language instruction tuning.
Project: https://zcmax.github.io/projects/LLaVA-3D/
LlaVA-3D: adds 3D-awareness to LVMs without compromising 2D understanding capabilities.
Method: they developed a unified architecture that maps 2D clip patch features to their corresponding positions in 3D space - enabling joint 2D and 3D vision-language instruction tuning.
Project: https://zcmax.github.io/projects/LLaVA-3D/