openai/whisper-base — 4-Graph ONNX Export

Self-exported 4-graph Whisper ONNX for asrjs/speech-recognition.

Model

openai/whisper-base — 74M params, 6 encoder / 6 decoder layers.

Format

whisper-browser-self-export-v1 — 4-graph KV-cache split:

Graph	Input	Output	Runs
`encoder_model.onnx`	mel [1, 80, 3000]	hidden [1, 1500, 512]	Once per chunk
`decoder_init.onnx`	prompt_ids + encoder hidden	logits + full KV cache	Once per chunk
`decoder_step.onnx`	token_id + past KV	logits + updated self-attn KV	Per token
`decoder_align.onnx`	all token ids + encoder hidden	alignment [1, T, S]	Once after generation

No external data files — all weights are inline (model fits well under the 2 GB protobuf limit).

Variants

Dir	Precision	Total Size	Encoder	Init	Step	Align
`fp32/`	float32	753 MB	79 MB	300 MB	187 MB	189 MB
`fp16/`	float16 (export-time)	377 MB	39 MB	150 MB	93 MB	94 MB
`q8/`	int8 dynamic	256 MB	22 MB	75 MB	110 MB	48 MB

Each variant directory is self-contained: manifest.json, ONNX graphs, tokenizer.json, config files.

Dimensions

d_model: 512
decoder_layers: 6
decoder_attention_heads: 8
head_dim: 64
num_mel_bins: 80
max_source_positions: 1500 (encoder output frames, 3000 mel input)
max_target_positions: 448
vocab_size: 51865

Alignment heads

From generation_config.alignment_heads:

[3,1], [4,2], [4,3], [4,7], [5,1], [5,2], [5,4], [5,6]

Usage

TypeScript (asrjs/speech-recognition)

import { loadSplitGraphLocalModel } from '@asrjs/speech-recognition/models/whisper-seq2seq';

const model = loadSplitGraphLocalModel('./whisper-base-onnx-4graph', { variant: 'fp32' });
// or: { variant: 'fp16' }, { variant: 'q8' }

Python export (reproduce)

cd tools/whisper-onnx-export

# fp32
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant fp32

# fp16 (export-time, ORT-safe)
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant fp16

# q8 (post-export dynamic quantization)
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant q8

Validation

ONNX checker: pass (all variants, all graphs)
ORT CPU load: pass (all variants, all graphs)
Audit: 52 passes, 0 failures

License

Apache-2.0 (same as openai/whisper-base)

Downloads last month: 39

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ysdede/whisper-base-onnx-4graph

Base model

openai/whisper-base

Quantized

(224)

this model