NVIDIA Canary-SpeechLM (MLX Port)

This repository contains the pure MLX port of the NVIDIA Canary SpeechLM model (canary-qwen-2.5b).

By porting the model architecture to MLX (including Conformer block, relative attention layer, and projection layers), this version runs 100% locally on Apple Silicon with zero PyTorch dependencies at inference time.

Features

No PyTorch at Inference: Pure MLX implementation for optimal performance and memory on macOS.
Fast Transcription: RTF of 0.067x (runs 14.8x faster than real-time on Apple Silicon).
High-Fidelity Alignment: Intermediate outputs are validated to match PyTorch/NeMo reference feature maps within float16/float32 precision limits.

Performance Statistics

Measurements taken on Apple Silicon (M5 Pro):

Audio Duration: 3.88s
Feature Extraction + Conformer Encoding: 0.0506s
Prefill/Time-to-First-Token (TTFT): 0.0247s (2551.55 tok/s)
Decode Loop Generation Speed: 58.99 tok/s (up to 80.71 tok/s raw)
Real-Time Factor (RTF): 0.0674x (14.8x faster than real-time)

Installation & Setup

Clone this repository:

git clone https://huggingface.co/speechllms/canary-speechlm-mlx
cd canary-speechlm-mlx

Install dependencies:

pip install mlx mlx-lm librosa transformers soundfile

Ensure you have the base Qwen3-1.7B model downloaded (which contains the base tokenizer and weights):
```
python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-1.7B')"
```

Quick Usage

Run transcription directly from a WAV file:

python generate.py /path/to/audio.wav

Record & Transcribe from Microphone

If you have ffmpeg installed on your Mac (brew install ffmpeg), you can run the interactive recording script:

chmod +x record_and_transcribe.sh
./record_and_transcribe.sh

Technical Details

The port translates:

ConvSubsampling 8x downsampling module.
Conformer Block featuring depthwise 1D convolutions and relative multi-head self-attention.
Transformer-XL dynamic Relative Positional Encoding (RelPositionalEncoding).
LoRA adapter weight overlay on top of Qwen Causal LM.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support