Part of the LongCat-Video — MLX collection.

LongCat-Video-q8 (MLX)

8-bit quantized variant of mlx-community/LongCat-Video-bf16. Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive), same cfg_step_lora + refinement_lora files — just with the DiT Linears quantized to 8-bit via mlx.nn.quantize.

The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for near-bf16 quality. If you have the RAM headroom for 30 GB but not 42 GB, q8 is the right pick.

TL;DR


DiT	8-bit quantized (`group_size=64`, skip `final_layer.linear` + embedders + AdaLN)
DiT size	~15 GB (4 shards; 1.7× smaller than bf16's 26 GB)
VAE / umT5 / LoRAs	bf16 (unchanged from bf16-variant)
Total disk	~31 GB (vs 42 GB bf16)
Min unified memory	~48 GB recommended for 480p
Inference	50-step baseline OR 8-step with `cfg_step_lora` (fast)
License	MIT

Quantization details

Same skip pattern as q4 — see the q4 card for full notes on why each pattern is excluded (L11 + L42 in the skill-lessons).

The only difference vs q4 is bits=8 in the quantization config block.

Quick start

# 1. Pull weights (~31 GB)
hf download mlx-community/LongCat-Video-q8 --local-dir ./weights

# 2. Set up inference
git clone https://github.com/xocialize/longcat-video-mlx
cd longcat-video-mlx
python3.12 -m venv .venv
.venv/bin/pip install -e ".[parity]"

# 3. Run text-to-video — pass --variant q8
.venv/bin/python scripts/run_t2v.py \
    --weights ./weights/.. \
    --variant q8 \
    --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \
    --num-frames 93 \
    --out output_t2v.mp4

Choosing between bf16, q4, q8

Variant	Disk	Min RAM	Quality	Pick when
bf16	42 GB	64 GB	reference	Best output, you have the RAM headroom
q4	25 GB	32 GB	minor degradation	RAM is tight (32 GB Mac)
q8	30 GB	48 GB	very close to bf16	Best balance — small savings, near-bf16 quality