NVIDIA Canary-SpeechLM (MLX Port)
This repository contains the pure MLX port of the NVIDIA Canary SpeechLM model (canary-qwen-2.5b).
By porting the model architecture to MLX (including Conformer block, relative attention layer, and projection layers), this version runs 100% locally on Apple Silicon with zero PyTorch dependencies at inference time.
Features
- No PyTorch at Inference: Pure MLX implementation for optimal performance and memory on macOS.
- Fast Transcription: RTF of 0.067x (runs 14.8x faster than real-time on Apple Silicon).
- High-Fidelity Alignment: Intermediate outputs are validated to match PyTorch/NeMo reference feature maps within float16/float32 precision limits.
Performance Statistics
Measurements taken on Apple Silicon (M5 Pro):
- Audio Duration: 3.88s
- Feature Extraction + Conformer Encoding: 0.0506s
- Prefill/Time-to-First-Token (TTFT): 0.0247s (2551.55 tok/s)
- Decode Loop Generation Speed: 58.99 tok/s (up to 80.71 tok/s raw)
- Real-Time Factor (RTF): 0.0674x (14.8x faster than real-time)
Installation & Setup
Clone this repository:
git clone https://huggingface.co/speechllms/canary-speechlm-mlx cd canary-speechlm-mlxInstall dependencies:
pip install mlx mlx-lm librosa transformers soundfileEnsure you have the base Qwen3-1.7B model downloaded (which contains the base tokenizer and weights):
python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-1.7B')"
Quick Usage
Run transcription directly from a WAV file:
python generate.py /path/to/audio.wav
Record & Transcribe from Microphone
If you have ffmpeg installed on your Mac (brew install ffmpeg), you can run the interactive recording script:
chmod +x record_and_transcribe.sh
./record_and_transcribe.sh
Technical Details
The port translates:
ConvSubsampling8x downsampling module.- Conformer Block featuring depthwise 1D convolutions and relative multi-head self-attention.
- Transformer-XL dynamic Relative Positional Encoding (
RelPositionalEncoding). - LoRA adapter weight overlay on top of Qwen Causal LM.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support