openai/whisper-base β€” 4-Graph ONNX Export

Self-exported 4-graph Whisper ONNX for asrjs/speech-recognition.

Model

openai/whisper-base β€” 74M params, 6 encoder / 6 decoder layers.

Format

whisper-browser-self-export-v1 β€” 4-graph KV-cache split:

Graph Input Output Runs
encoder_model.onnx mel [1, 80, 3000] hidden [1, 1500, 512] Once per chunk
decoder_init.onnx prompt_ids + encoder hidden logits + full KV cache Once per chunk
decoder_step.onnx token_id + past KV logits + updated self-attn KV Per token
decoder_align.onnx all token ids + encoder hidden alignment [1, T, S] Once after generation

No external data files β€” all weights are inline (model fits well under the 2 GB protobuf limit).

Variants

Dir Precision Total Size Encoder Init Step Align
fp32/ float32 753 MB 79 MB 300 MB 187 MB 189 MB
fp16/ float16 (export-time) 377 MB 39 MB 150 MB 93 MB 94 MB
q8/ int8 dynamic 256 MB 22 MB 75 MB 110 MB 48 MB

Each variant directory is self-contained: manifest.json, ONNX graphs, tokenizer.json, config files.

Dimensions

  • d_model: 512
  • decoder_layers: 6
  • decoder_attention_heads: 8
  • head_dim: 64
  • num_mel_bins: 80
  • max_source_positions: 1500 (encoder output frames, 3000 mel input)
  • max_target_positions: 448
  • vocab_size: 51865

Alignment heads

From generation_config.alignment_heads:

[3,1], [4,2], [4,3], [4,7], [5,1], [5,2], [5,4], [5,6]

Usage

TypeScript (asrjs/speech-recognition)

import { loadSplitGraphLocalModel } from '@asrjs/speech-recognition/models/whisper-seq2seq';

const model = loadSplitGraphLocalModel('./whisper-base-onnx-4graph', { variant: 'fp32' });
// or: { variant: 'fp16' }, { variant: 'q8' }

Python export (reproduce)

cd tools/whisper-onnx-export

# fp32
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant fp32

# fp16 (export-time, ORT-safe)
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant fp16

# q8 (post-export dynamic quantization)
.venv/bin/python export_whisper.py openai/whisper-base ./output --device cpu --variant q8

Validation

  • ONNX checker: pass (all variants, all graphs)
  • ORT CPU load: pass (all variants, all graphs)
  • Audit: 52 passes, 0 failures

License

Apache-2.0 (same as openai/whisper-base)

Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ysdede/whisper-base-onnx-4graph

Quantized
(224)
this model