Expert QA Pipeline (EMNLP 2026 Industry)
Collection
SFT checkpoints from a 2x2 factorial ablation of a Korean speech-to-SFT pipeline (medical + finance). 9 LLMs, 2.4B-70B. EMNLP 2026 Industry. • 182 items • Updated
SFT checkpoint from the EMNLP 2026 Industry Track submission A Factorial Ablation of a Speech-to-SFT Pipeline: Differential Effects on Data Quality and Downstream Transfer.
| Field | Value |
|---|---|
| Pipeline condition | Exp 2 (full pipeline (Phase 0 + Phase 2)) |
| Domain | finance |
| Seed | 2 |
| Base model | Qwen/Qwen2.5-4B-Instruct |
| Training | LoRA (rank 16, α=32, QLoRA 4-bit, lr 2e-4, 3 epochs) |
| Upstream STT | In-house STT (paper main pipeline) |
| License | CC BY-NC 4.0 (research and non-commercial use only) |
Intended use: research and non-commercial use only, matching the consent scope of the source audio.
Companion repository (code, configs, prompts, sample QA): https://github.com/flitto/speech-to-sft-ablation-paper