ARK-ASR-0.6B INT8 ONNX

GitHub arXiv License

Overview

ARK-ASR-0.6B INT8 ONNX is the ONNX Runtime package for the 0.6B ARK-ASR automatic speech recognition model. It is intended for local and edge-device ASR inference when a compact ONNX pipeline is preferred over loading the full Transformers checkpoint.

ARK-ASR is trained with the teacher-data adaptation plus online policy distillation recipe from AutoArk/open-audio-opd. The full 0.6B Transformers checkpoint is available as AutoArk-AI/ARK-ASR-0.6B.

Supported Languages

Chinese, English, German, Japanese, French, Korean, Spanish, Polish, Italian, Romanian, Hungarian, Czech, Dutch, Finnish, Croatian, Slovak, Slovene, Estonian, and Lithuanian.

Package Contents

This repository contains a self-contained INT8 ONNX inference package:

.
β”œβ”€β”€ infer_ark_audio_onnx.py
β”œβ”€β”€ README_INT8_ASR_USAGE.md
β”œβ”€β”€ build/
β”‚   └── llm_kv_fp32_qwen_native.json
└── model/
    β”œβ”€β”€ llm_kv_cpu_fp32_int8.onnx
    β”œβ”€β”€ audio_encoder_whisper_int8.onnx
    β”œβ”€β”€ audio_encoder_adapter_int8.onnx
    β”œβ”€β”€ embedding_fp32.onnx
    β”œβ”€β”€ embedding_fp32.data
    β”œβ”€β”€ runtime_manifest.json
    └── tokenizer, processor, and model configuration files

The package includes:

  • INT8 ONNX files for the decoder, audio encoder, and audio adapter
  • FP32 token embedding ONNX assets
  • tokenizer, processor, and runtime configuration files
  • a standalone ASR inference script, infer_ark_audio_onnx.py

Installation

Python 3.10 or 3.11 is recommended.

pip install onnxruntime torch transformers librosa soundfile numpy

For GPU inference, install the onnxruntime-gpu build that matches your CUDA environment.

Quick Start

Download or clone this repository, then run inference from the repository root:

python infer_ark_audio_onnx.py \
  --audio /path/to/audio.wav \
  --max-new-tokens 128

You can also run the script from another directory by passing --runtime-root:

python /path/to/ark-asr-0.6b-int8-onnx/infer_ark_audio_onnx.py \
  --runtime-root /path/to/ark-asr-0.6b-int8-onnx \
  --audio /path/to/audio.wav \
  --max-new-tokens 128

The script prints one transcription line.

Python Usage

from pathlib import Path

from infer_ark_audio_onnx import ArkAsrOnnxRuntime

runtime = ArkAsrOnnxRuntime(Path("/path/to/ark-asr-0.6b-int8-onnx"))
text = runtime.transcribe(
    audio_path="/path/to/audio.wav",
    max_new_tokens=128,
    max_audio_seconds=30,
    precision="int8",
    asr_block_token_id_from=151670,
)
print(text)

Decoding Behavior

The inference script filters non-text control tokens by default while preserving the EOS token for normal generation stopping. The filter covers:

  • special tokens from tokenizer.all_special_ids, except eos_token_id
  • added vocabulary entries that look like control tokens
  • non-ASR text-range token IDs greater than or equal to 151670

See README_INT8_ASR_USAGE.md for the full local usage guide and decoding details.

Model Details

  • Task: automatic speech recognition
  • Format: INT8 ONNX Runtime package
  • Base model: ARK-ASR-0.6B
  • Sampling rate: 16 kHz
  • License: Apache-2.0
  • Training and evaluation code: AutoArk/open-audio-opd

Citation

@misc{lin2026dataefficientopd,
  title={Data-Efficient On-Policy Distillation for Automatic Speech Recognition},
  author={Lin, Yu and Wang, Yiming and Cai, Runyuan and Zeng, Xiaodong},
  year={2026},
  eprint={2605.28139},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2605.28139}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for AutoArk-AI/ark-asr-0.6b-int8-onnx