Text Generation
Transformers
Safetensors
GGUF
llava
remyx
Inference Endpoints

image/png

Model Card for SpaceLLaVA

SpaceLLaVA uses LoRA to fine-tune LLaVA on a dataset designed with VQASynth to enhance spatial reasoning as in SpatialVLM

Model Details

Model Description

This model uses data synthesis techniques and publically available models to reproduce the work described in SpatialVLM to enhance the spatial reasoning of multimodal models. With a pipeline of expert models, we can infer spatial relationships between objects in a scene to create VQA dataset for spatial reasoning.

  • Developed by: remyx.ai
  • Model type: MultiModal Model, Vision Language Model, LLaVA
  • License: Apache-2.0
  • Finetuned from model: LLaVA

Model Sources

Uses

Use this model to query spatial relationships between objects in a scene.

Open In Colab

Try it on Discord: http://discord.gg/b2yGuCNpuC

image/png

Deployment

docker build -f Dockerfile -t spacellava-server:latest

docker run -it --rm --gpus all -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 12G spacellava-server:latest

python3 client.py --image_path "https://remyx.ai/assets/spatialvlm/warehouse_rgb.jpg" --prompt "What is the distance between the man in the red hat and the pallet of boxes?"

Citation

@article{chen2024spatialvlm,
  title = {SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities},
  author = {Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei},
  journal = {arXiv preprint arXiv:2401.12168},
  year = {2024},
  url = {https://arxiv.org/abs/2401.12168},
}

@misc{liu2023llava,
      title={Visual Instruction Tuning},
      author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
      publisher={NeurIPS},
      year={2023},
}
Downloads last month
206
Safetensors
Model size
13.4B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for remyxai/SpaceLLaVA

Quantized
(3)
this model

Dataset used to train remyxai/SpaceLLaVA

Space using remyxai/SpaceLLaVA 1

Collection including remyxai/SpaceLLaVA