javi8979 commited on
Commit
9c79b32
1 Parent(s): 8a8fffc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -47,7 +47,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
47
 
48
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
49
 
50
- For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
51
 
52
  ## Intended Uses and Limitations
53
 
@@ -96,11 +96,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
96
  | Warmup Steps | 2000 |
97
 
98
 
99
- More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
100
 
101
  ## Evaluation
102
 
103
- Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
104
 
105
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
106
  |----------------------|-------------|--------------|------------|-------------|
 
47
 
48
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
49
 
50
+ For more details regarding the model architecture, the dataset and model interpretability take a look at the [paper](https://arxiv.org/abs/2406.09140).
51
 
52
  ## Intended Uses and Limitations
53
 
 
96
  | Warmup Steps | 2000 |
97
 
98
 
99
+ More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
100
 
101
  ## Evaluation
102
 
103
+ Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
104
 
105
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
106
  |----------------------|-------------|--------------|------------|-------------|