Instructions to use BRZ911/Latent-VC-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BRZ911/Latent-VC-9B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("BRZ911/Latent-VC-9B") model = AutoModelForMultimodalLM.from_pretrained("BRZ911/Latent-VC-9B") - Notebooks
- Google Colab
- Kaggle
Latent-VC-9B
Latent-VC-9B is a ~9B video multimodal large language model built on the Qwen3.5-VL architecture and trained with Latent Visual Compression (Latent-VC) for efficient video reasoning. The model performs latent video chain-of-thought decoding (lvc_reasoning) and is optimized through a two-stage pipeline: supervised fine-tuning (SFT) followed by GRPO reinforcement learning.
Related Resources
| Resource | Link |
|---|---|
| Code repository | https://github.com/BRZ911/Latent-VC |
| Training & evaluation data | https://huggingface.co/datasets/BRZ911/Latent-VC-Data |
| Model weights (this repo) | https://huggingface.co/BRZ911/Latent-VC-9B |
Model Details
- Architecture:
Qwen3_5ForConditionalGeneration(Qwen3.5-VL backbone) - Hidden size: 4096 | Layers: 32 (hybrid linear + full attention) | Vision depth: 27
- Dtype: bfloat16
- Parameters: ~9B
- Special capability: Latent Visual Compression (LVC) tokens for latent video reasoning (
lvc_temperature=0.07,loss_lvc_fct=cosine) - Training: SFT (Stage 1) → GRPO (Stage 2); this checkpoint corresponds to the GRPO
checkpoint-800.
Files
| File | Description |
|---|---|
model.safetensors |
Model weights (~18.8 GB) |
config.json |
Model configuration |
generation_config.json |
Generation configuration |
tokenizer.json, tokenizer_config.json |
Tokenizer |
processor_config.json, chat_template.jinja |
Processor & chat template |
eval_all_benchmarks.py |
Multi-benchmark evaluation script |
eval_all_benchmarks.sh |
Evaluation launcher |
This model uses custom LVC components. For training and LVC reasoning inference, please use the code from the Latent-VC repository.
Usage
Download
pip install -U "huggingface_hub[cli]"
hf download BRZ911/Latent-VC-9B --local-dir ./Latent-VC-9B
Evaluation
The included scripts evaluate the model on six video benchmarks (VideoMME, MVBench, TempCompass, VideoMMMU, VSIBench, MMVU). The evaluation data is hosted in the companion dataset BRZ911/Latent-VC-Data under Eval/ — download and extract it so that an Evaluation/ directory (with eval_<dataset>.json and the per-benchmark videos) is available, then run:
# MODEL_PATH : path to the downloaded weights (this repo)
# EVAL_DIR : path to the extracted Evaluation/ directory (default: ./Evaluation)
# DATASETS : comma-separated subset of
# videomme,mvbench,tempcompass,videommmu,vsibench,mmvu
MODEL_PATH=./Latent-VC-9B \
EVAL_DIR=./Evaluation \
DATASETS=mmvu \
bash eval_all_benchmarks.sh
See the Latent-VC repository for the full inference/training environment and dependencies.
Benchmarks
The model is evaluated on the following benchmarks (data in BRZ911/Latent-VC-Data):
| Benchmark | Samples |
|---|---|
| VideoMME | 2700 |
| MVBench | 4000 |
| TempCompass | 7540 |
| VideoMMMU | 900 |
| VSIBench | 5130 |
| MMVU | 625 |
Citation
If you find this model or dataset useful, please cite the project:
@misc{latentvc,
title = {Latent-VC: Latent Visual Compression for Efficient Video Reasoning},
author = {BRZ911},
year = {2025},
url = {https://github.com/BRZ911/Latent-VC}
}
- Downloads last month
- -