Instructions to use BasinShapers/wilor-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use BasinShapers/wilor-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir wilor-mlx BasinShapers/wilor-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
WiLoR hand pose estimation rebuilt end-to-end in MLX for Apple Silicon
We rebuilt WiLoR-mini end-to-end in MLX for Apple Silicon β the full inference pipeline including ViT-H/16 backbone, MANO hand model, and RefineNet refinement stage, with sub-millimeter geometric parity against PyTorch.
We couldn't find another public WiLoR MLX or CoreML port, so we're publishing this as a technical priority flag. If we missed related work, we'd love pointers.
One-line setup
from wilor_mlx import WiLoR
model = WiLoR.from_pretrained() # auto-downloads weights, derives MANO locally
First run needs torch once for MANO conversion from the upstream WiLoR-mini checkpoint. After that, inference is pure MLX β no torch dependency.
Performance (M4 Max, float32)
The important measurement is the live sidecar route we actually use for interaction: camera frame β hand crop β WiLoR-mini pose/reconstruction β hand-pose event.
On a clean post-reboot M4 Max same-harness smoke over recent 160x120 saved frames from a gesture UI prototype, MLX runs the pose/reconstruction model stage at about 37ms median versus 49ms for PyTorch MPS, and the full saved-frame route at about 49ms versus 60ms. That is roughly a 1.3x model-stage advantage and a 1.2x full-route advantage on the fair comparison denominator we trust most right now.
That latency is low enough to make 3D hand pose plausible as a real-time control primitive, not just a batch inference model. Our traces point to dispatch and synchronization as the main difference, not memory copies: both routes sit on Apple Silicon unified memory, but MLX's lazy graph gives the hot path fewer places for a hitch to land.
Older app-level PyTorch MPS telemetry is what motivated the port; clean reruns moved the comparison denominator enough that we're not using the old tail history as a fresh universal PyTorch-vs-MLX headline.
Larger derived-frame stress tests widen both backends; MLX remained faster in those runs, but we treat those numbers as route/runtime stress evidence rather than the headline model benchmark.
Lower-bandwidth M2 Pro/Tahoe validation also shows MLX ahead on archived hand-positive frames, but recent macOS/Metal changes moved both backends enough that we are treating exact M2 Pro numbers as rebaseline work rather than headline copy.
Numerical accuracy
| Output | Max abs diff |
|---|---|
| Mesh vertices (778Γ3) | 0.006 |
| Hand keypoints (21Γ3) | 0.006 |
Sub-millimeter. Verified layer-by-layer through all 32 transformer blocks β the residual is float32 accumulation noise, not a port error.
Weights
Float32 (2.4 GB) and int4 (490 MB) safetensors on this model page. Int4 is a download/storage convenience β same inference speed because the model is compute-bound at 210 tokens, not memory-bandwidth-bound.
MANO licensing
MANO is separately licensed by the Max Planck Institute. wilor-mlx does not bundle or rehost MANO data β it fetches upstream WiLoR-mini assets and converts locally. You can also supply your own copy via mano_path=....
Links
- Code: https://github.com/lyonsno/wilor-mlx
- Weights: https://huggingface.co/BasinShapers/wilor-mlx
- Original: WiLoR-mini (Zhan et al., CVPR)