Part of the LongCat-Video β€” MLX collection.

LongCat-Video-q8 (MLX)

8-bit quantized variant of mlx-community/LongCat-Video-bf16. Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive), same cfg_step_lora + refinement_lora files β€” just with the DiT Linears quantized to 8-bit via mlx.nn.quantize.

The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for near-bf16 quality. If you have the RAM headroom for 30 GB but not 42 GB, q8 is the right pick.

TL;DR

DiT 8-bit quantized (group_size=64, skip final_layer.linear + embedders + AdaLN)
DiT size ~15 GB (4 shards; 1.7Γ— smaller than bf16's 26 GB)
VAE / umT5 / LoRAs bf16 (unchanged from bf16-variant)
Total disk ~31 GB (vs 42 GB bf16)
Min unified memory ~48 GB recommended for 480p
Inference 50-step baseline OR 8-step with cfg_step_lora (fast)
License MIT

Quantization details

Same skip pattern as q4 β€” see the q4 card for full notes on why each pattern is excluded (L11 + L42 in the skill-lessons).

The only difference vs q4 is bits=8 in the quantization config block.

Quick start

# 1. Pull weights (~31 GB)
hf download mlx-community/LongCat-Video-q8 --local-dir ./weights

# 2. Set up inference
git clone https://github.com/xocialize/longcat-video-mlx
cd longcat-video-mlx
python3.12 -m venv .venv
.venv/bin/pip install -e ".[parity]"

# 3. Run text-to-video β€” pass --variant q8
.venv/bin/python scripts/run_t2v.py \
    --weights ./weights/.. \
    --variant q8 \
    --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \
    --num-frames 93 \
    --out output_t2v.mp4

Choosing between bf16, q4, q8

Variant Disk Min RAM Quality Pick when
bf16 42 GB 64 GB reference Best output, you have the RAM headroom
q4 25 GB 32 GB minor degradation RAM is tight (32 GB Mac)
q8 30 GB 48 GB very close to bf16 Best balance β€” small savings, near-bf16 quality

License

MIT β€” matches the upstream LongCat-Video license.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mlx-community/LongCat-Video-q8

Finetuned
(2)
this model

Collection including mlx-community/LongCat-Video-q8