javi8979 commited on
Commit
dc51035
1 Parent(s): 4683d08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -46,7 +46,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
46
 
47
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
48
 
49
- For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv]().
50
 
51
  ## Intended Uses and Limitations
52
 
@@ -114,7 +114,13 @@ Below are the evaluation results on Flores-200 and NTREX for supervised MT direc
114
  ## Citation
115
 
116
  ```bibtex
117
-
 
 
 
 
 
 
118
  ```
119
 
120
  ## Additional information
 
46
 
47
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
48
 
49
+ For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
50
 
51
  ## Intended Uses and Limitations
52
 
 
114
  ## Citation
115
 
116
  ```bibtex
117
+ @misc{gilabert2024investigating,
118
+ title={Investigating the translation capabilities of Large Language Models trained on parallel data only},
119
+ author={Javier García Gilabert and Carlos Escolano and Aleix Sant Savall and Francesca De Luca Fornaciari and Audrey Mash and Xixian Liao and Maite Melero},
120
+ year={2024},
121
+ eprint={2406.09140},
122
+ archivePrefix={arXiv}
123
+ }
124
  ```
125
 
126
  ## Additional information