Qwen2.5-Omni SICL SFT Control Adapter

This repository contains the adapter-only matched SFT control checkpoint for the GRPO necessity experiment in SICL-GRPO. It is a PEFT LoRA adapter for Qwen/Qwen2.5-Omni-7B, not a merged full model.

HF repo: MagicLuke/qwen25omni-sicl-sft-control

Checkpoint

Base model: Qwen/Qwen2.5-Omni-7B
Adapter type: PEFT LoRA
LoRA rank: 8
LoRA alpha: 32
Target modules: Qwen2.5-Omni thinker language-model linear layers
Source checkpoint step: checkpoint-245
Training regime: 1 epoch, bf16, LoRA SFT, final assistant response only (loss_scale=last_round)

Training Data

The adapter was trained on 2000 CV-ASR 3-shot SFT-format examples. Each row contains three in-context audio/transcript examples followed by a target audio/transcript pair.

System prompt:

You are a speech recognition model that transcribe audios into its original language.

User prompt:

<audio>
Transcribe the English speech into English text without any punctuation marks.

SFT message format:

system
user + ICE audio 1 instruction
assistant ICE transcript, loss=false
user + ICE audio 2 instruction
assistant ICE transcript, loss=false
user + ICE audio 3 instruction
assistant ICE transcript, loss=false
user + target audio instruction
assistant target transcript, loss=true

Evaluation Summary

RSR cells show WER with bounded WER in parentheses. RSR uses normalized, token-weighted corpus WER; lower is better. MMAR/MMAU are total accuracy; higher is better.

Checkpoint	RSR 0-shot WER (bWER)	RSR 3-shot WER (bWER)	MMAR 0-shot	MMAR 3-shot	MMAU 0-shot	MMAU 3-shot
raw	36.53% (35.92%)	28.11% (27.53%)	50.40%	54.70%	61.30%	70.80%
SFT matched	35.90% (34.77%)	21.97% (21.62%)	48.50%	55.50%	62.50%	70.30%
GRPO 3-shot	27.22% (26.89%)	17.62% (17.41%)	50.10%	55.10%	66.90%	73.00%

Usage

Load this repository as a PEFT adapter on top of Qwen/Qwen2.5-Omni-7B. The exact inference stack used for the experiment was Swift/vLLM with Qwen2.5-Omni audio support.

Example with PEFT-style loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "MagicLuke/qwen25omni-sicl-sft-control")

For Qwen2.5-Omni audio inference, use the same processor/template path as your local Swift or Transformers Omni setup.

Notes

This is the SFT control adapter, not the GRPO adapter.
The uploaded files intentionally exclude optimizer, scheduler, RNG, and trainer state.
args.json is included as a run metadata snapshot.

Downloads last month: 7

Model tree for MagicLuke/qwen25omni-sicl-sft-control

Base model

Qwen/Qwen2.5-Omni-7B

Adapter

(28)

this model