A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Published on Sep 20, 2023
· Featured in Daily Papers on Sep 22, 2023


Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.


My highlights from the paper:

TLDR: New training approach enables smaller AI models to achieve state-of-the-art translation performance

Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle.

Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters.

How it works:

  • Fine-tune on monolingual data in non-English languages to improve comprehension
  • Further fine-tune on small sets of high-quality human-translated parallel text

The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods:

  • Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters
  • Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training

I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important.

Full summary here.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Sign up or log in to comment

Models citing this paper 31

Browse 31 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 3

Collections including this paper 14