Edit model card

Model Name

This is a multilingually fine-tuned version of NLLB based on nllb-200-distilled-600M using the text data of MuST-C v1.0 (En -> 8).

It is part of the paper Pushing the Limits of Zero-shot End-to-end Speech Translation. Details for the fine-tuning process are available at Appendix D.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")
model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")

model.eval()
model.to("cuda")

text = "Translate this text to German."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    num_beams=5,
    forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"]
)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)

Results

BLEU scores on MuST-C v1.0 tst-COMMON

Model De Es Fr It Nl Pt Ro Ru Average
nllb-200-distilled-600M (original) 32.7 36.9 45.2 32.2 36.0 37.4 30.3 21.0 34.0
nllb-200-distilled-600M_mustc_en-to-8 34.4 38.8 44.6 34.7 39.0 41.6 32.1 22.4 35.9
nllb-200-distilled-1.3B (original) 34.6 38.6 46.8 33.7 38.2 39.6 31.8 23.2 35.8
nllb-200-distilled-1.3B_mustc_en-to-8 35.3 39.9 45.8 36.0 40.6 43.1 32.6 23.9 37.2

Citation

If you find these models useful for your research, please cite our paper :)

@inproceedings{tsiamas-etal-2024-pushing,
    title = {{Pushing the Limits of Zero-shot End-to-End Speech Translation}},
    author = "Tsiamas, Ioannis  and
      G{\'a}llego, Gerard  and
      Fonollosa, Jos{\'e}  and
      Costa-juss{\`a}, Marta",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.847",
    pages = "14245--14267",
}
Downloads last month
43
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including johntsi/nllb-200-distilled-600M_mustc_en-to-8