CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation
Abstract
Specialized Japanese-English translation models outperform multilingual counterparts on real-world benchmarks despite strong WMT performance.
Nowadays, large multilingual translation models demonstrate impressive translation capabilities in the machine translation benchmarks. This raises a practical question to the developers: is it worth developing translation models specialized for a particular language pair if you only need to support that language pair? To give an anecdotal answer to this question, we develop a family of small language models (0.8B, 1.4B, 3.3B, and 7B parameters) specialized for Japanese-English bidirectional translation. We employ a two-stage supervised fine-tuning approach followed by Multi-Objective GRPO (Ichihara et al. 2025) to train models on synthetically generated parallel corpora. We evaluate our models on WMT and real-world translation benchmarks across business, legal, medical, financial, and patent domains. While multilingual models achieve strong performance on WMT benchmarks, our compact models outperform them on real-world benchmarks, suggesting the practical utility of developing specialized translation models even in the era of large multilingual models.
Get this paper in your agent:
hf papers read 2606.21413 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 4
cyberagent/CAT-Translate-7b
Datasets citing this paper 1
cyberagent/CAT-Translate-Dataset
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper