Latent-VC-9B

Latent-VC-9B is a ~9B video multimodal large language model built on the Qwen3.5-VL architecture and trained with Latent Visual Compression (Latent-VC) for efficient video reasoning. The model performs latent video chain-of-thought decoding (lvc_reasoning) and is optimized through a two-stage pipeline: supervised fine-tuning (SFT) followed by GRPO reinforcement learning.

Related Resources

Resource Link
Code repository https://github.com/BRZ911/Latent-VC
Training & evaluation data https://huggingface.co/datasets/BRZ911/Latent-VC-Data
Model weights (this repo) https://huggingface.co/BRZ911/Latent-VC-9B

Model Details

  • Architecture: Qwen3_5ForConditionalGeneration (Qwen3.5-VL backbone)
  • Hidden size: 4096  |  Layers: 32 (hybrid linear + full attention)  |  Vision depth: 27
  • Dtype: bfloat16
  • Parameters: ~9B
  • Special capability: Latent Visual Compression (LVC) tokens for latent video reasoning (lvc_temperature=0.07, loss_lvc_fct=cosine)
  • Training: SFT (Stage 1) → GRPO (Stage 2); this checkpoint corresponds to the GRPO checkpoint-800.

Files

File Description
model.safetensors Model weights (~18.8 GB)
config.json Model configuration
generation_config.json Generation configuration
tokenizer.json, tokenizer_config.json Tokenizer
processor_config.json, chat_template.jinja Processor & chat template
eval_all_benchmarks.py Multi-benchmark evaluation script
eval_all_benchmarks.sh Evaluation launcher

This model uses custom LVC components. For training and LVC reasoning inference, please use the code from the Latent-VC repository.

Usage

Download

pip install -U "huggingface_hub[cli]"
hf download BRZ911/Latent-VC-9B --local-dir ./Latent-VC-9B

Evaluation

The included scripts evaluate the model on six video benchmarks (VideoMME, MVBench, TempCompass, VideoMMMU, VSIBench, MMVU). The evaluation data is hosted in the companion dataset BRZ911/Latent-VC-Data under Eval/ — download and extract it so that an Evaluation/ directory (with eval_<dataset>.json and the per-benchmark videos) is available, then run:

# MODEL_PATH : path to the downloaded weights (this repo)
# EVAL_DIR   : path to the extracted Evaluation/ directory (default: ./Evaluation)
# DATASETS   : comma-separated subset of
#              videomme,mvbench,tempcompass,videommmu,vsibench,mmvu
MODEL_PATH=./Latent-VC-9B \
EVAL_DIR=./Evaluation \
DATASETS=mmvu \
bash eval_all_benchmarks.sh

See the Latent-VC repository for the full inference/training environment and dependencies.

Benchmarks

The model is evaluated on the following benchmarks (data in BRZ911/Latent-VC-Data):

Benchmark Samples
VideoMME 2700
MVBench 4000
TempCompass 7540
VideoMMMU 900
VSIBench 5130
MMVU 625

Citation

If you find this model or dataset useful, please cite the project:

@misc{latentvc,
  title  = {Latent-VC: Latent Visual Compression for Efficient Video Reasoning},
  author = {BRZ911},
  year   = {2025},
  url    = {https://github.com/BRZ911/Latent-VC}
}
Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BRZ911/Latent-VC-9B