ascolda commited on
Commit
1035d20
1 Parent(s): 1352967

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - ascolda/ru_en_Crystallography_and_Spectroscopy
4
+ language:
5
+ - ru
6
+ - en
7
+ metrics:
8
+ - bleu
9
+ pipeline_tag: translation
10
+ tags:
11
+ - chemistry
12
+ ---
13
+ # nllb-200-distilled-600M_ru_en_finetuned_crystallography
14
+
15
+ This model is a fine-tuned version of facebook/nllb-200-distilled-600M trained on the ascolda/ru_en_Crystallography_and_Spectroscopy dataset
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 0.5602
18
+ - Bleu: 56.5855
19
+
20
+ ## Model description
21
+
22
+ The finetuned model yieled better performance on the machine translation task of domain-specific scientific articles related to the Crystallography and Spectroscopy domain.
23
+
24
+ ## Metrics used to describe the fine-tuning effect
25
+
26
+ Below is the comparison of the translation quality metrics for the original NLLB model and my finetuned version. Evaluation is focused on: (1) general translation quality, (2) quality of translation of specific
27
+ terminology, and (3) uniformity of translation of domain-specific terms in different contexts.
28
+
29
+ (1) The general translation quality was evaluated using the Bleu metric.
30
+
31
+ (2) Term Success Rate. In the terminology success rate we compared the machine-translated terms with their dictionary equivalents by checking for the presence of the reference terminology translation in the output by the regular expression match.
32
+
33
+ (3) Term Consistency. This metric looks at whether technical terms are translated uniformly across the entire text corpus in different contexts. We aim for high consistency,
34
+ measured by the low occurrence of multiple translations for the same term within the evaluation dataset.
35
+
36
+ | Model | BLEU | Term Success Rate | Term Consistency |
37
+ |:--------------------------------------------------------------:|:-------:|:-------------------:|:----------------:|
38
+ | nllb-200-distilled-600M | 38.19 | 0.246 | 0.199 |
39
+ | nllb-200-distilled-600M_ru_en_finetuned_crystallography | 56.59 | 0.573 | 0.740 |