--- license: apache-2.0 language: - en - zh - ja - fr - es - it - pt tags: - generative translation - large language model - LLaMA metrics: - bleu pipeline_tag: text-generation datasets: - PeacefulData/HypoTranslate --- This repo releases the trained LLaMA-adapter weights in paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators". **Code:** https://github.com/YUCHEN005/GenTranslate **Data:** https://huggingface.co/datasets/PeacefulData/HypoTranslate **Model:** This repo ***Filename format:*** [data\_source]\_[src\_language\_code]\_[tgt\_language\_code]\_[task].pth e.g. covost2_ar_en_st.pth ***Note:*** - Language code look-up: Table 15 & 17 in https://arxiv.org/pdf/2402.06894.pdf - Source/target language refers to the translation task, so that the N-best hypotheses and ground-truth transcription are both in target language - For speech translation datasets (FLEURS, CoVoST-2, MuST-C), the task ID "mt" denotes cascaded ASR+MT system If you consider this work would be related or useful for your research, please kindly consider to cite the work below. Thank you. ```bib @inproceedings{hu2024gentranslate, title = "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators", author = "Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", publisher = "Association for Computational Linguistics", year = "2024" } ```