vinid commited on
Commit
555732f
1 Parent(s): 6350928

update text for IRR

Browse files
Files changed (1) hide show
  1. introduction.md +3 -4
introduction.md CHANGED
@@ -79,12 +79,12 @@ Instead of relying on open-source translators, we decided to use DeepL. **Transl
79
  reason of this choice. With the few images (wrt OpenAI) that we have, we cannot risk polluting our own data. CC is a great resource
80
  but the captions have to be handled accordingly. We translated 700K captions and we evaluated their quality:
81
 
82
- Two of us looked at a sample of 100 of the translations and rated them with scores from 1 to 4.
83
  1: the sentence has lost is meaning or it's not possible to understand it; 2: it is possible to get the idea
84
  but there something wrong; 3: good, however a native speaker might complain about some translations; 4: good translation.
85
 
86
- The average score was of 3.8 and the two annotators had an inter-rater agreement - computed with [Gwet's AC1](https://bpspsychub.onlinelibrary.wiley.com/doi/full/10.1348/000711006X126600) using ordinal
87
- weighting - of 0.86 (great agreement!).
88
 
89
  | English Captions | Italian Captions |
90
  | ----------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
@@ -93,7 +93,6 @@ weighting - of 0.86 (great agreement!).
93
  | popular rides at night at the county fair | giostre popolari di notte alla fiera della contea |
94
 
95
 
96
-
97
  We know that we annotated our own data; in the spirit of fairness we also share the annotations and the captions so
98
  that those interested can check the quality. The Google Sheet is [here](https://docs.google.com/spreadsheets/d/1m6TkcpJbmJlEygL7SXURIq2w8ZHuVvsmdEuCIH0VENk/edit?usp=sharing).
99
 
 
79
  reason of this choice. With the few images (wrt OpenAI) that we have, we cannot risk polluting our own data. CC is a great resource
80
  but the captions have to be handled accordingly. We translated 700K captions and we evaluated their quality:
81
 
82
+ Three of us looked at a sample of 100 of the translations and rated them with scores from 1 to 4.
83
  1: the sentence has lost is meaning or it's not possible to understand it; 2: it is possible to get the idea
84
  but there something wrong; 3: good, however a native speaker might complain about some translations; 4: good translation.
85
 
86
+ The average score was of 3.78 and the two annotators had an inter-rater agreement - computed with [Gwet's AC1](https://bpspsychub.onlinelibrary.wiley.com/doi/full/10.1348/000711006X126600) using ordinal
87
+ weighting - of 0.858 (great agreement!).
88
 
89
  | English Captions | Italian Captions |
90
  | ----------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|
 
93
  | popular rides at night at the county fair | giostre popolari di notte alla fiera della contea |
94
 
95
 
 
96
  We know that we annotated our own data; in the spirit of fairness we also share the annotations and the captions so
97
  that those interested can check the quality. The Google Sheet is [here](https://docs.google.com/spreadsheets/d/1m6TkcpJbmJlEygL7SXURIq2w8ZHuVvsmdEuCIH0VENk/edit?usp=sharing).
98