--- pipeline_tag: translation datasets: - ascolda/ru_en_Crystallography_and_Spectroscopy language: - ru - en metrics: - bleu tags: - chemistry --- # nllb-200-distilled-600M_ru_en_finetuned_crystallography This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) trained on the [ascolda/ru_en_Crystallography_and_Spectroscopy](https://huggingface.co/datasets/ascolda/ru_en_Crystallography_and_Spectroscopy) dataset. It achieves the following results on the evaluation set: - Loss: 0.5602 - Bleu: 56.5855 ## Model description The finetuned model yieled better performance on the machine translation task of domain-specific scientific articles related to the Crystallography and Spectroscopy domain. ## Metrics used to describe the fine-tuning effect Below is the comparison of the translation quality metrics for the original NLLB model and my finetuned version. Evaluation is focused on: (1) general translation quality, (2) quality of translation of specific terminology, and (3) uniformity of translation of domain-specific terms in different contexts. (1) The general translation quality was evaluated using the Bleu metric. (2) Term Success Rate. In the terminology success rate we compared the machine-translated terms with their dictionary equivalents by checking for the presence of the reference terminology translation in the output by the regular expression match. (3) Term Consistency. This metric looks at whether technical terms are translated uniformly across the entire text corpus in different contexts. We aim for high consistency, measured by the low occurrence of multiple translations for the same term within the evaluation dataset. | Model | BLEU | Term Success Rate | Term Consistency | |:--------------------------------------------------------------:|:-------:|:-------------------:|:----------------:| | nllb-200-distilled-600M | 38.19 | 0.246 | 0.199 | | nllb-200-distilled-600M_ru_en_finetuned_crystallography | 56.59 | 0.573 | 0.740 |