Qwen2.5-Omni SICL SFT Control Adapter

This repository contains the adapter-only matched SFT control checkpoint for the GRPO necessity experiment in SICL-GRPO. It is a PEFT LoRA adapter for Qwen/Qwen2.5-Omni-7B, not a merged full model.

HF repo: MagicLuke/qwen25omni-sicl-sft-control

Checkpoint

  • Base model: Qwen/Qwen2.5-Omni-7B
  • Adapter type: PEFT LoRA
  • LoRA rank: 8
  • LoRA alpha: 32
  • Target modules: Qwen2.5-Omni thinker language-model linear layers
  • Source checkpoint step: checkpoint-245
  • Training regime: 1 epoch, bf16, LoRA SFT, final assistant response only (loss_scale=last_round)

Training Data

The adapter was trained on 2000 CV-ASR 3-shot SFT-format examples. Each row contains three in-context audio/transcript examples followed by a target audio/transcript pair.

System prompt:

You are a speech recognition model that transcribe audios into its original language.

User prompt:

<audio>
Transcribe the English speech into English text without any punctuation marks.

SFT message format:

system
user + ICE audio 1 instruction
assistant ICE transcript, loss=false
user + ICE audio 2 instruction
assistant ICE transcript, loss=false
user + ICE audio 3 instruction
assistant ICE transcript, loss=false
user + target audio instruction
assistant target transcript, loss=true

Evaluation Summary

RSR cells show WER with bounded WER in parentheses. RSR uses normalized, token-weighted corpus WER; lower is better. MMAR/MMAU are total accuracy; higher is better.

Checkpoint RSR 0-shot WER (bWER) RSR 3-shot WER (bWER) MMAR 0-shot MMAR 3-shot MMAU 0-shot MMAU 3-shot
raw 36.53% (35.92%) 28.11% (27.53%) 50.40% 54.70% 61.30% 70.80%
SFT matched 35.90% (34.77%) 21.97% (21.62%) 48.50% 55.50% 62.50% 70.30%
GRPO 3-shot 27.22% (26.89%) 17.62% (17.41%) 50.10% 55.10% 66.90% 73.00%

Usage

Load this repository as a PEFT adapter on top of Qwen/Qwen2.5-Omni-7B. The exact inference stack used for the experiment was Swift/vLLM with Qwen2.5-Omni audio support.

Example with PEFT-style loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "MagicLuke/qwen25omni-sicl-sft-control")

For Qwen2.5-Omni audio inference, use the same processor/template path as your local Swift or Transformers Omni setup.

Notes

  • This is the SFT control adapter, not the GRPO adapter.
  • The uploaded files intentionally exclude optimizer, scheduler, RNG, and trainer state.
  • args.json is included as a run metadata snapshot.
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MagicLuke/qwen25omni-sicl-sft-control

Adapter
(28)
this model