4 23 157

Adriel Martins

Martins6

https://github.com/Martins6

Martins6

AI & ML interests

Graph Neural Networks (GNN) & Robot Learning & Multimodal AI

Recent Activity

upvoted an article 11 days ago

Vision Language Models (Better, Faster, Stronger)

liked a Space 21 days ago

LeRobot-worldwide-hackathon/worldwide-map

liked a Space 29 days ago

huggingface/InferenceSupport

View all activity

Organizations

None yet

Martins6's activity

upvoted an article 11 days ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

11 days ago

• 360

liked a Space 21 days ago

Worldwide Map

🌎

Display a worldwide map with hackathon events

liked a Space 29 days ago

112

InferenceSupport

💥

Discussions about the Inference Providers feature on the Hub

liked a model 29 days ago

nari-labs/Dia-1.6B

Text-to-Speech • Updated 9 days ago • 198k • • 2.34k

reacted to salma-remyx's post with 🔥 29 days ago

Post

1791

SpaceThinker-Qwen2.5VL-3B shows a 3B VLM can compete with closed, frontier APIs in quantitative spatial reasoning, a key capability for embodied AI applications like drones and robotics.

Check out how it stacks up against Gemini and OpenAI on Q-Spatial-Bench in the ModelCard. Includes .gguf, colab quickstart, docker images.

SpaceThinker adopts the Qwen2.5VL-3B architecture, fine-tuned on the SpaceThinker dataset of synthetic spatial reasoning traces, created with VQASynth

This model builds upon the SpaceLLaVA series of VLMs finetuned for enhanced spatial reasoning using synthetic data by adding test-time compute for multimodal thinking.

Model: remyxai/SpaceThinker-Qwen2.5VL-3B
Dataset: remyxai/SpaceThinker
Space: remyxai/SpaceThinker-Qwen2.5VL-3B
Code: https://github.com/remyxai/VQASynth
Discussion: open-r1/README#10