Latent-VC-9B

Latent-VC-9B is a ~9B video multimodal large language model built on the Qwen3.5-VL architecture and trained with Latent Visual Compression (Latent-VC) for efficient video reasoning. The model performs latent video chain-of-thought decoding (lvc_reasoning) and is optimized through a two-stage pipeline: supervised fine-tuning (SFT) followed by GRPO reinforcement learning.

Related Resources

Resource	Link
Code repository	https://github.com/BRZ911/Latent-VC
Training & evaluation data	https://huggingface.co/datasets/BRZ911/Latent-VC-Data
Model weights (this repo)	https://huggingface.co/BRZ911/Latent-VC-9B

Model Details

Architecture: Qwen3_5ForConditionalGeneration (Qwen3.5-VL backbone)
Hidden size: 4096 | Layers: 32 (hybrid linear + full attention) | Vision depth: 27
Dtype: bfloat16
Parameters: ~9B
Special capability: Latent Visual Compression (LVC) tokens for latent video reasoning (lvc_temperature=0.07, loss_lvc_fct=cosine)
Training: SFT (Stage 1) → GRPO (Stage 2); this checkpoint corresponds to the GRPO checkpoint-800.

Files

File	Description
`model.safetensors`	Model weights (~18.8 GB)
`config.json`	Model configuration
`generation_config.json`	Generation configuration
`tokenizer.json`, `tokenizer_config.json`	Tokenizer
`processor_config.json`, `chat_template.jinja`	Processor & chat template
`eval_all_benchmarks.py`	Multi-benchmark evaluation script
`eval_all_benchmarks.sh`	Evaluation launcher

This model uses custom LVC components. For training and LVC reasoning inference, please use the code from the Latent-VC repository.

Usage

Download

pip install -U "huggingface_hub[cli]"
hf download BRZ911/Latent-VC-9B --local-dir ./Latent-VC-9B

Evaluation

The included scripts evaluate the model on six video benchmarks (VideoMME, MVBench, TempCompass, VideoMMMU, VSIBench, MMVU). The evaluation data is hosted in the companion dataset BRZ911/Latent-VC-Data under Eval/ — download and extract it so that an Evaluation/ directory (with eval_<dataset>.json and the per-benchmark videos) is available, then run:

# MODEL_PATH : path to the downloaded weights (this repo)
# EVAL_DIR   : path to the extracted Evaluation/ directory (default: ./Evaluation)
# DATASETS   : comma-separated subset of
#              videomme,mvbench,tempcompass,videommmu,vsibench,mmvu
MODEL_PATH=./Latent-VC-9B \
EVAL_DIR=./Evaluation \
DATASETS=mmvu \
bash eval_all_benchmarks.sh

See the Latent-VC repository for the full inference/training environment and dependencies.

Benchmarks

The model is evaluated on the following benchmarks (data in BRZ911/Latent-VC-Data):

Benchmark	Samples
VideoMME	2700
MVBench	4000
TempCompass	7540
VideoMMMU	900
VSIBench	5130
MMVU	625

Citation

If you find this model or dataset useful, please cite the project:

@misc{latentvc,
  title  = {Latent-VC: Latent Visual Compression for Efficient Video Reasoning},
  author = {BRZ911},
  year   = {2025},
  url    = {https://github.com/BRZ911/Latent-VC}
}

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

BRZ911
/

Latent-VC-9B