ARK-ASR-0.6B INT8 ONNX

Overview

ARK-ASR-0.6B INT8 ONNX is the ONNX Runtime package for the 0.6B ARK-ASR automatic speech recognition model. It is intended for local and edge-device ASR inference when a compact ONNX pipeline is preferred over loading the full Transformers checkpoint.

ARK-ASR is trained with the teacher-data adaptation plus online policy distillation recipe from AutoArk/open-audio-opd. The full 0.6B Transformers checkpoint is available as AutoArk-AI/ARK-ASR-0.6B.

Supported Languages

Chinese, English, German, Japanese, French, Korean, Spanish, Polish, Italian, Romanian, Hungarian, Czech, Dutch, Finnish, Croatian, Slovak, Slovene, Estonian, and Lithuanian.

Package Contents

This repository contains a self-contained INT8 ONNX inference package:

.
├── infer_ark_audio_onnx.py
├── README_INT8_ASR_USAGE.md
├── build/
│   └── llm_kv_fp32_qwen_native.json
└── model/
    ├── llm_kv_cpu_fp32_int8.onnx
    ├── audio_encoder_whisper_int8.onnx
    ├── audio_encoder_adapter_int8.onnx
    ├── embedding_fp32.onnx
    ├── embedding_fp32.data
    ├── runtime_manifest.json
    └── tokenizer, processor, and model configuration files

The package includes:

INT8 ONNX files for the decoder, audio encoder, and audio adapter
FP32 token embedding ONNX assets
tokenizer, processor, and runtime configuration files
a standalone ASR inference script, infer_ark_audio_onnx.py

Installation

Python 3.10 or 3.11 is recommended.

pip install onnxruntime torch transformers librosa soundfile numpy

For GPU inference, install the onnxruntime-gpu build that matches your CUDA environment.

Quick Start

Download or clone this repository, then run inference from the repository root:

python infer_ark_audio_onnx.py \
  --audio /path/to/audio.wav \
  --max-new-tokens 128

You can also run the script from another directory by passing --runtime-root:

python /path/to/ark-asr-0.6b-int8-onnx/infer_ark_audio_onnx.py \
  --runtime-root /path/to/ark-asr-0.6b-int8-onnx \
  --audio /path/to/audio.wav \
  --max-new-tokens 128

The script prints one transcription line.

Python Usage

from pathlib import Path

from infer_ark_audio_onnx import ArkAsrOnnxRuntime

runtime = ArkAsrOnnxRuntime(Path("/path/to/ark-asr-0.6b-int8-onnx"))
text = runtime.transcribe(
    audio_path="/path/to/audio.wav",
    max_new_tokens=128,
    max_audio_seconds=30,
    precision="int8",
    asr_block_token_id_from=151670,
)
print(text)

Decoding Behavior

The inference script filters non-text control tokens by default while preserving the EOS token for normal generation stopping. The filter covers:

special tokens from tokenizer.all_special_ids, except eos_token_id
added vocabulary entries that look like control tokens
non-ASR text-range token IDs greater than or equal to 151670

See README_INT8_ASR_USAGE.md for the full local usage guide and decoding details.

Model Details

Task: automatic speech recognition
Format: INT8 ONNX Runtime package
Base model: ARK-ASR-0.6B
Sampling rate: 16 kHz
License: Apache-2.0
Training and evaluation code: AutoArk/open-audio-opd

Citation

@misc{lin2026dataefficientopd,
  title={Data-Efficient On-Policy Distillation for Automatic Speech Recognition},
  author={Lin, Yu and Wang, Yiming and Cai, Runyuan and Zeng, Xiaodong},
  year={2026},
  eprint={2605.28139},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.28139}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for AutoArk-AI/ark-asr-0.6b-int8-onnx

Data-Efficient On-Policy Distillation for Automatic Speech Recognition

Paper • 2605.28139 • Published 25 days ago • 3