Nice scores guys!
Comparison with ALMA-7B-R
https://forum.opennmt.net/t/llms-as-nmt-comparison-between-alma-7b-r-and-towerinstruct/5641
Cheers! We have noticed the same results internally: on what comes to neural metrics both models are very competitive -- we see a slight edge for TowerInstruct -- but on lexical metrics (BLEU, chrF) there is a huge gap in performance between the two models (favourable for Tower).
Yes! Both ALMA-R and Tower include previous WMT test sets. The best dataset to compare to is WMT23.
We will release the paper very soon with all those numbers there.
when I look deeper into TowerBlocks I don't see any of these WMT testsets, please explain in what fields subset I should look.
and btw for alma, in the original paper it says: "The training parallel data is sourced from the WMT’17 to WMT’20. T"
Alright, thanks for the clarification: ALMA sources from 2017 to 2020. We use WMT data from 2014 to 2022.
We did not use the full test sets --- we selected only a few samples from each test set with high-quality translations. In TowerBlocks, they are under "general_mt_clean". We also released all translation records and their sources here: https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.1-MT-records
IMO you need to be specific in the model card saying you trained on this dataset because you only mention towerblocks not towerblocks-mt-records
To clarify, TowerBlocks includes TowerBlocks-MT-records. We only created the latter because some practitioners asked us to create a dataset composed exclusively by the MT records in TowerBlocks.
okay, then back to my initial question.
in TowerBlocks, if I select task="machine translation" and "split=train" I am getting only the following "datasets": news21_docs_filtered, opus_doc_filtered, ted_talks_doc_filtered
so wmt14-22 are included where ?
Oh I see now! That can indeed be a bit confusing -- we will make that clearer in the model card.
We have used data from training and development (or testing) sets in TowerBlocks --- the split refers to that (the origin of the data, and not how we used it to build TowerInstruct). To get access to all the sentence-level MT data, disregard the "split" column, and only select the task.