You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NbAiLab / nb-asr-beta-qwen06b-lunde05

Norwegian Qwen3-ASR dynamic checkpoint for long-transcript and prompted ASR evaluation

This repository contains an NB-ASR beta checkpoint based on Qwen3-ASR-0.6B, adapted by NbAiLab for Norwegian speech recognition evaluation, with emphasis on longer transcripts and prompted transcription workflows.

New this time is that the model is dynamic. By default it is "reading optimised", however, the user can prompt the model with <verbatim>, and the output will be more verbatim. Try it out.

Internal reference:

NB-ASR-QWEN3-lunde05-faclean-boost-long-from250k-16gpu-50k-lr1e-4-cosine-bf16-lunde05-faclean-boost-long-50k-nowarmup-restart2-20260604-1550/checkpoint-50000

Uploaded: 2026-06-08

The immediate purpose of this release is to support:

  • reproducible NB-ASR-BETA evaluation,
  • loading and inference validation in realistic environments,
  • long-transcript ASR testing,
  • prompted transcription experiments,
  • and packaging of a reviewed checkpoint for Hugging Face distribution.

Confidential beta release: this model card and the associated weights are intended for approved evaluators and collaborators. Treat the checkpoint as beta material rather than a public production release.

Provenance

This HF repo was prepared from the Olivia training artifact:

/cluster/work/projects/nn30001k/nb-asr/output/nb-asr-qwen3/checkpoints/NB-ASR-QWEN3-lunde05-faclean-boost-long-from250k-16gpu-50k-lr1e-4-cosine-bf16-lunde05-faclean-boost-long-50k-nowarmup-restart2-20260604-1550/checkpoint-50000

The packaging step selected checkpoint-50000 from the run and copied the files required for inference and model loading into this staged Hugging Face repository.

Training-state files such as optimizer state, scheduler state, RNG snapshots, trainer metadata, training arguments, and checkpoint container files were intentionally left out of this HF package.

Overview

This model is part of the NB-ASR beta group and is intended for technical evaluation, integration testing, and model-card maintenance in the Hugging Face workflow. It is suitable for:

  • local transcription experiments,
  • long-audio or long-transcript evaluation workflows,
  • prompt-conditioned ASR experiments,
  • batch inference,
  • serving tests,
  • and end-to-end evaluation through the project's standard scripts.

Because this is a beta checkpoint, recognition behavior, formatting, prompt sensitivity, and runtime characteristics may still change. Current results should be treated as provisional.

Recommended Usage

The preferred interface is the official qwen-asr package, which exposes both a standard transformers backend and a vLLM-backed serving path.

Install the base package

pip install -U qwen-asr

Install the vLLM extras

pip install -U "qwen-asr[vllm]"

Optional FlashAttention 2

pip install -U flash-attn --no-build-isolation

For lower-memory build environments:

MAX_JOBS=4 pip install -U flash-attn --no-build-isolation

Quick Start: Transformers Backend

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "NbAiLab/nb-asr-beta-qwen06b-lunde05",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    # attn_implementation="flash_attention_2",
    max_inference_batch_size=16,
    max_new_tokens=2048,
)

results = model.transcribe(
    audio="audio.wav",
    language=None,
)

print(results[0].language)
print(results[0].text)

Notes:

  • audio can usually be provided as a local path, URL, base64 payload, or waveform tuple depending on backend support.
  • No example audio file is bundled in this repository.
  • language=None enables automatic language detection.
  • If you want forced decoding for a known language, set language="Norwegian" if that matches your environment and prompt conventions.
  • For long-transcript experiments, tune max_new_tokens, batch size, and backend settings for the expected audio duration and GPU memory.

Quick Start: vLLM Backend

from qwen_asr import Qwen3ASRModel

if __name__ == "__main__":
    model = Qwen3ASRModel.LLM(
        model="NbAiLab/nb-asr-beta-qwen06b-lunde05",
        gpu_memory_utilization=0.7,
        max_inference_batch_size=64,
        max_new_tokens=4096,
    )

    results = model.transcribe(
        audio="audio.wav",
        language=None,
    )

    print(results[0].language)
    print(results[0].text)

Prompting

This checkpoint was trained for prompted ASR workflows in addition to long-transcript transcription. Prompt handling depends on the installed qwen-asr version and serving backend. Use the project's current inference scripts or wrapper APIs when passing system prompts or other prompt-conditioning metadata.

When comparing results, record:

  • the prompt text or prompt template,
  • backend used,
  • qwen-asr, transformers, and vllm versions,
  • decoding parameters,
  • and approximate audio duration.

Serving

You can expose an OpenAI-compatible endpoint with:

qwen-asr-serve NbAiLab/nb-asr-beta-qwen06b-lunde05 \
  --gpu-memory-utilization 0.8 \
  --host 0.0.0.0 \
  --port 8000

Depending on the installed stack version, a standard vllm serve flow may also be appropriate.

Web Demo

To test the model in a local browser-based demo:

qwen-asr-demo \
  --asr-checkpoint NbAiLab/nb-asr-beta-qwen06b-lunde05 \
  --backend transformers \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 \
  --port 8000

Then open:

http://<your-ip>:8000

Feedback Requested

During the beta period, the most useful feedback is:

  • whether the model loads successfully,
  • environment and installation problems,
  • CUDA or OOM issues,
  • inference crashes,
  • long-audio or long-transcript regressions,
  • prompt-sensitivity issues,
  • batching or serving regressions,
  • and compatibility with downstream evaluation or synchronization workflows.

If possible, include:

  • GPU type,
  • Python version,
  • relevant package versions,
  • backend used,
  • prompt or prompt template used,
  • approximate audio duration,
  • and any error trace or logs.

Included Files

This staged HF repository includes the inference-facing model assets copied from the source checkpoint:

  • model.safetensors
  • config.json
  • generation_config.json
  • preprocessor_config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • vocab.json
  • merges.txt
  • added_tokens.json
  • chat_template.jinja

Access Lifecycle

This repository belongs to the NB-ASR-BETA release group.

  • Development state: private.
  • Beta release state: public + gated.
  • Beta release date: 2026-06-08.
  • Collection on release: https://huggingface.co/collections/NbAiLab/nb-asr-beta.

Do not change visibility, gating, or collection membership without following the NB-ASR release procedure.

Intended Scope

This checkpoint is meant for technical evaluation and repo maintenance during the beta phase. It should not be treated as a stable public benchmark or final production model without further validation.

Acknowledgements

This model is based on the open Qwen3-ASR framework and adapted by the NB-ASR project at the National Library of Norway.

The following persons have contributed to the dataset creation and training:

  • Freddy Wetjen
  • Thea Tollersrud
  • Phoebe Parsons
  • Per Egil Kummervold
Downloads last month
619
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NbAiLab/nb-asr-beta-qwen06b-lunde05