ai-forever commited on
Commit
ab7cd5a
1 Parent(s): c724bd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -18
README.md CHANGED
@@ -23,15 +23,15 @@ model-index:
23
  metrics:
24
  - name: Precision
25
  type: precision
26
- value: 88.4
27
  verified: false
28
  - name: Recall
29
  type: recall
30
- value: 71.6
31
  verified: false
32
  - name: F1
33
  type: f1
34
- value: 79.1
35
  verified: false
36
  - task:
37
  type: text-generation
@@ -41,15 +41,15 @@ model-index:
41
  metrics:
42
  - name: Precision
43
  type: precision
44
- value: 65.3
45
  verified: false
46
  - name: Recall
47
  type: recall
48
- value: 62.7
49
  verified: false
50
  - name: F1
51
  type: f1
52
- value: 63.9
53
  verified: false
54
  - task:
55
  type: text-generation
@@ -59,15 +59,15 @@ model-index:
59
  metrics:
60
  - name: Precision
61
  type: precision
62
- value: 77.7
63
  verified: false
64
  - name: Recall
65
  type: recall
66
- value: 77.5
67
  verified: false
68
  - name: F1
69
  type: f1
70
- value: 77.6
71
  verified: false
72
  - task:
73
  type: text-generation
@@ -77,15 +77,15 @@ model-index:
77
  metrics:
78
  - name: Precision
79
  type: precision
80
- value: 69.5
81
  verified: false
82
  - name: Recall
83
  type: recall
84
- value: 46.0
85
  verified: false
86
  - name: F1
87
  type: f1
88
- value: 55.3
89
  verified: false
90
  - task:
91
  type: text-generation
@@ -131,7 +131,7 @@ model-index:
131
  ## Summary
132
 
133
  The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
134
- Corrector had been trained based on the model [FRED-T5-1.7B](https://huggingface.co/google/mt5-large) architecture.
135
  An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
136
 
137
  ## Public references
@@ -164,7 +164,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
164
  **RUSpellRU**
165
  | Model | Precision | Recall | F1 |
166
  | --- | --- | --- | --- |
167
- | sage-mt5-large | 88.4 | 71.6 | 79.1 |
 
168
  | sage-ai-service | 93.5 | 82.4 | 87.6 |
169
  | gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
170
  | gpt-4 | 69.5 | 81.0 | 74.8 |
@@ -172,7 +173,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
172
  **MultidomainGold**
173
  | Model | Precision | Recall | F1 |
174
  | --- | --- | --- | --- |
175
- | sage-mt5-large | 65.3 | 62.7 | 63.9 |
 
176
  | sage-ai-service | 70.9 | 68.8 | 69.9 |
177
  | gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
178
  | gpt-4 | 31.1 | 78.1 | 44.5 |
@@ -180,20 +182,39 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
180
  **MedSpellChecker**
181
  | Model | Precision | Recall | F1 |
182
  | --- | --- | --- | --- |
183
- | sage-mt5-large | 77.7 | 77.5 | 77.6 |
 
184
  | sage-ai-service | 73.4 | 76.2 | 74.9 |
185
  | gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
186
  | gpt-4 | 48.9 | 88.7 | 63.1 |
187
 
188
-
189
  **GitHubTypoCorpusRu**
190
  | Model | Precision | Recall | F1 |
191
  | --- | --- | --- | --- |
192
- | sage-mt5-large | 69.5 | 46.0 | 55.3 |
 
193
  | sage-ai-service | 76.1 | 51.2 | 61.2 |
194
  | gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
195
  | gpt-4 | 34.7 | 60.5 | 44.1|
196
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
 
198
  ## How to use
199
  ```python
 
23
  metrics:
24
  - name: Precision
25
  type: precision
26
+ value: 56.2
27
  verified: false
28
  - name: Recall
29
  type: recall
30
+ value: 65.8
31
  verified: false
32
  - name: F1
33
  type: f1
34
+ value: 60.6
35
  verified: false
36
  - task:
37
  type: text-generation
 
41
  metrics:
42
  - name: Precision
43
  type: precision
44
+ value: 42.1
45
  verified: false
46
  - name: Recall
47
  type: recall
48
+ value: 47.5
49
  verified: false
50
  - name: F1
51
  type: f1
52
+ value: 44.6
53
  verified: false
54
  - task:
55
  type: text-generation
 
59
  metrics:
60
  - name: Precision
61
  type: precision
62
+ value: 38.6
63
  verified: false
64
  - name: Recall
65
  type: recall
66
+ value: 56.0
67
  verified: false
68
  - name: F1
69
  type: f1
70
+ value: 45.7
71
  verified: false
72
  - task:
73
  type: text-generation
 
77
  metrics:
78
  - name: Precision
79
  type: precision
80
+ value: 52.8
81
  verified: false
82
  - name: Recall
83
  type: recall
84
+ value: 49.8
85
  verified: false
86
  - name: F1
87
  type: f1
88
+ value: 51.2
89
  verified: false
90
  - task:
91
  type: text-generation
 
131
  ## Summary
132
 
133
  The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
134
+ Corrector had been trained based on the model [mT5-large](https://huggingface.co/google/mt5-large) architecture.
135
  An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
136
 
137
  ## Public references
 
164
  **RUSpellRU**
165
  | Model | Precision | Recall | F1 |
166
  | --- | --- | --- | --- |
167
+ | sage-mt5-large | 56.2 | 65.8 | 60.6 |
168
+ | sage-mt5-large (ft.) | 88.4 | 71.6 | 79.1 |
169
  | sage-ai-service | 93.5 | 82.4 | 87.6 |
170
  | gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
171
  | gpt-4 | 69.5 | 81.0 | 74.8 |
 
173
  **MultidomainGold**
174
  | Model | Precision | Recall | F1 |
175
  | --- | --- | --- | --- |
176
+ | sage-mt5-large | 42.1 | 47.5 | 44.6 |
177
+ | sage-mt5-large (ft.) | 65.3 | 62.7 | 63.9 |
178
  | sage-ai-service | 70.9 | 68.8 | 69.9 |
179
  | gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
180
  | gpt-4 | 31.1 | 78.1 | 44.5 |
 
182
  **MedSpellChecker**
183
  | Model | Precision | Recall | F1 |
184
  | --- | --- | --- | --- |
185
+ | sage-mt5-large | 38.6 | 56.0 | 45.7 |
186
+ | sage-mt5-large (ft.) | 77.7 | 77.5 | 77.6 |
187
  | sage-ai-service | 73.4 | 76.2 | 74.9 |
188
  | gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
189
  | gpt-4 | 48.9 | 88.7 | 63.1 |
190
 
 
191
  **GitHubTypoCorpusRu**
192
  | Model | Precision | Recall | F1 |
193
  | --- | --- | --- | --- |
194
+ | sage-mt5-large | 52.8 | 49.8 | 51.2 |
195
+ | sage-mt5-large (ft.) | 69.5 | 46.0 | 55.3 |
196
  | sage-ai-service | 76.1 | 51.2 | 61.2 |
197
  | gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
198
  | gpt-4 | 34.7 | 60.5 | 44.1|
199
 
200
+ **BEA60K**
201
+ | Model | Precision | Recall | F1 |
202
+ | --- | --- | --- | --- |
203
+ | sage-mt5-large | 64.7 | 83.8 | 73.0 |
204
+ | gpt-3.5-turbo | 66.9 | 84.1 | 74.5 |
205
+ | gpt-4 | 68.6 | 85.2 | 76.0 |
206
+ | Bert (https://github.com/neuspell/neuspell) | 65.8 | 79.6 | 72.0 |
207
+ | SC-LSTM (https://github.com/neuspell/neuspell) | 62.2 | 80.3 | 72.0 |
208
+
209
+ **JFLEG**
210
+ | Model | Precision | Recall | F1 |
211
+ | --- | --- | --- | --- |
212
+ | sage-mt5-large | 74.9 | 88.4 | 81.1 |
213
+ | gpt-3.5-turbo | 77.8 | 88.6 | 82.9 |
214
+ | gpt-4 | 77.9 | 88.3 | 82.8 |
215
+ | Bert (https://github.com/neuspell/neuspell) | 78.5 | 85.4 | 81.8 |
216
+ | SC-LSTM (https://github.com/neuspell/neuspell) | 80.6 | 86.1 | 83.2 |
217
+
218
 
219
  ## How to use
220
  ```python