Instructions to use avlp12/Lance-3B-Alis-MLX-Traced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use avlp12/Lance-3B-Alis-MLX-Traced with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Lance-3B-Alis-MLX-Traced avlp12/Lance-3B-Alis-MLX-Traced
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Lance-3B-Alis-MLX-Traced
ByteDance Lance 3B (image + video) converted to Apple MLX, byte-clean against the original PyTorch.
Layout
Two standalone weights, one per variant β matching ByteDance's Lance_3B/ +
Lance_3B_Video/ layout:
| Path | Variant | Size | dtype | keys |
|---|---|---|---|---|
Lance_3B/model.safetensors |
image (LLM + adapters) | 24.7 GB | F32 | 1021 |
Lance_3B_Video/model.safetensors |
video (standalone: backbone + 31-frame pos-embed + video ViT) | 28.4 GB | F32 | 1411 |
The image ViT and the Wan 2.2 VAE are separate files (see the repo for setup).
The weight is not the point β the verification is
The image weight (Lance_3B/) is bit-identical (SHA256
5ede2f0aβ¦547817) with RockTalk/Lance-3B-MLX
β both are the same deterministic F32 conversion from
bytedance-research/Lance.
(This is the F32 build; a separate bf16 build is
mlx-community/Lance-3B-bf16.)
The differentiator of this release is not the weight. It is the verification trace: every stage of the port was cross-validated against the original PyTorch via byte-diff before the next stage started, and the full harness + lesson log is public:
π github.com/avlp12/lance_alis_mlx
Verification
Every gate uses original PyTorch direct import (not a clean
re-implementation) under a shim, same PRNG (NumPy) on both sides, byte-diff at
every layer. 23 lessons across stages 1β9; see the repo's LEARNING_LOG/.
| Stage | Component | Gate |
|---|---|---|
| 1 | PT β MLX weight conversion | bit-exact (SHA256 match) |
| 5 | Wan 2.2 VAE image path (T=1) | ~40 dB PSNR round-trip vs PT |
| 6 | Flow matching + CFG (T2I) | end-to-end cos β₯ 0.999 vs PT 30-step |
| 7 | ViT + XβT + TI2I | cos β₯ 0.999 + real-photo perceptual |
| 8 | 3D causal video VAE | 4 gates cos = 1.000000 (encode + decode) |
| 9 | T2V (video DiT + flow matching) | 30-step latent cos β₯ 0.999, video pixel cos = 0.999338 vs PT |
The video weight reproduces STAGE 9 t2v exactly (single-step cos = 0.999916 / 0.999848 / 0.999452 vs PT). Its converted video supplement is 391/391 byte-identical to RockTalk's, which is in turn byte-clean vs the original PT supplement.
Honesty note. T2V is verified end-to-end and uses only the 1021-key subset. The video weight also bundles the video ViT (
vit_model, byte-clean vs PT), but the x2t_video / video_edit pipelines that would consume it are not yet implemented in MLX.
Usage
Inference is pure MLX β no PyTorch at runtime (PyTorch is imported only by
the verification harnesses in tools/).
git clone https://github.com/avlp12/lance_alis_mlx
cd lance_alis_mlx
python3.12 -m venv .venv && source .venv/bin/activate
pip install mlx mlx-vlm transformers safetensors einops pillow huggingface_hub numpy
# weights: image (Lance_3B/) + video (Lance_3B_Video/)
hf download avlp12/Lance-3B-Alis-MLX-Traced --local-dir checkpoints/Lance-Alis
hf download RockTalk/Wan2.2-VAE-MLX --local-dir checkpoints/Wan2.2-VAE-MLX
# generate (see the repo README for the exact checkpoints/ layout)
PYTHONPATH=. .venv/bin/python tools/stage6_t2i_smoke.py # text-to-image
PYTHONPATH=. .venv/bin/python tools/stage7_ti2i_smoke.py # image edit
Apple Silicon required (developed on M3 Ultra). Python 3.12.
License & citation
Apache 2.0 β same as upstream ByteDance Lance and Alibaba Wan 2.2 VAE.
@misc{fu2026lanceunifiedmultimodalmodeling,
title = {Lance: Unified Multimodal Modeling by Multi-Task Synergy},
author = {Fengyi Fu and Mengqi Huang and Shaojin Wu and Yunsheng Jiang and Yufei Huo and Hao Li and Yinghang Song and Fei Ding and Jianzhu Guo and Qian He and Zheren Fu and Zhendong Mao and Yongdong Zhang},
year = {2026},
eprint = {2605.18678},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2605.18678},
}
Acknowledgments
- ByteDance Lance team β original PyTorch model and research
- RockTalk β MLX checkpoint conversion used as the F32 parity reference (image + video supplement)
- Alibaba Wan 2.2 team β 3D Causal VAE architecture
Quantized