Instructions to use MagicLuke/qwen25omni-sicl-sft-control with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MagicLuke/qwen25omni-sicl-sft-control with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-7B") model = PeftModel.from_pretrained(base_model, "MagicLuke/qwen25omni-sicl-sft-control") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-Omni SICL SFT Control Adapter
This repository contains the adapter-only matched SFT control checkpoint for the GRPO necessity experiment in SICL-GRPO. It is a PEFT LoRA adapter for Qwen/Qwen2.5-Omni-7B, not a merged full model.
HF repo: MagicLuke/qwen25omni-sicl-sft-control
Checkpoint
- Base model:
Qwen/Qwen2.5-Omni-7B - Adapter type: PEFT LoRA
- LoRA rank: 8
- LoRA alpha: 32
- Target modules: Qwen2.5-Omni thinker language-model linear layers
- Source checkpoint step:
checkpoint-245 - Training regime: 1 epoch, bf16, LoRA SFT, final assistant response only (
loss_scale=last_round)
Training Data
The adapter was trained on 2000 CV-ASR 3-shot SFT-format examples. Each row contains three in-context audio/transcript examples followed by a target audio/transcript pair.
System prompt:
You are a speech recognition model that transcribe audios into its original language.
User prompt:
<audio>
Transcribe the English speech into English text without any punctuation marks.
SFT message format:
system
user + ICE audio 1 instruction
assistant ICE transcript, loss=false
user + ICE audio 2 instruction
assistant ICE transcript, loss=false
user + ICE audio 3 instruction
assistant ICE transcript, loss=false
user + target audio instruction
assistant target transcript, loss=true
Evaluation Summary
RSR cells show WER with bounded WER in parentheses. RSR uses normalized, token-weighted corpus WER; lower is better. MMAR/MMAU are total accuracy; higher is better.
| Checkpoint | RSR 0-shot WER (bWER) | RSR 3-shot WER (bWER) | MMAR 0-shot | MMAR 3-shot | MMAU 0-shot | MMAU 3-shot |
|---|---|---|---|---|---|---|
| raw | 36.53% (35.92%) | 28.11% (27.53%) | 50.40% | 54.70% | 61.30% | 70.80% |
| SFT matched | 35.90% (34.77%) | 21.97% (21.62%) | 48.50% | 55.50% | 62.50% | 70.30% |
| GRPO 3-shot | 27.22% (26.89%) | 17.62% (17.41%) | 50.10% | 55.10% | 66.90% | 73.00% |
Usage
Load this repository as a PEFT adapter on top of Qwen/Qwen2.5-Omni-7B. The exact inference stack used for the experiment was Swift/vLLM with Qwen2.5-Omni audio support.
Example with PEFT-style loading:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "MagicLuke/qwen25omni-sicl-sft-control")
For Qwen2.5-Omni audio inference, use the same processor/template path as your local Swift or Transformers Omni setup.
Notes
- This is the SFT control adapter, not the GRPO adapter.
- The uploaded files intentionally exclude optimizer, scheduler, RNG, and trainer state.
args.jsonis included as a run metadata snapshot.
- Downloads last month
- 7
Model tree for MagicLuke/qwen25omni-sicl-sft-control
Base model
Qwen/Qwen2.5-Omni-7B