ai-forever
commited on
Commit
•
f80d0a7
1
Parent(s):
39a25d0
Update README.md
Browse files
README.md
CHANGED
@@ -17,10 +17,12 @@ The model corrects spelling errors and typos by bringing all the words in the te
|
|
17 |
Corrector was trained based on the model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B).
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/orgs/ai-forever/sage).
|
19 |
|
20 |
-
###
|
21 |
-
- [
|
22 |
-
- [
|
23 |
-
- [
|
|
|
|
|
24 |
|
25 |
### Examples
|
26 |
| Input | Output |
|
@@ -104,7 +106,7 @@ print(answer)
|
|
104 |
```
|
105 |
|
106 |
## Resources
|
107 |
-
- [SAGE library
|
108 |
- [ruM2M100-1.2B](https://huggingface.co/ai-forever/RuM2M100-1.2B), HuggingFace
|
109 |
- [ruM2M100-418M](https://huggingface.co/ai-forever/RuM2M100-420M), HuggingFace
|
110 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
@@ -112,7 +114,7 @@ print(answer)
|
|
112 |
|
113 |
## License
|
114 |
Model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
115 |
-
Our solution also comes with
|
116 |
|
117 |
## Specifications
|
118 |
- File size: 5 Gb;
|
@@ -122,4 +124,4 @@ Our solution also comes with an MIT license.
|
|
122 |
- Developer: SberDevices, AGI NLP
|
123 |
|
124 |
## Contacts
|
125 |
-
|
|
|
17 |
Corrector was trained based on the model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B).
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/orgs/ai-forever/sage).
|
19 |
|
20 |
+
### Public references
|
21 |
+
- [SAGE library announcement](https://youtu.be/yFfkV0Qjuu0), DataFest 2023
|
22 |
+
- [Paper about synthetic error generation methods](https://www.dialog-21.ru/media/5914/martynovnplusetal056.pdf), Dialogue 2023
|
23 |
+
- [Paper about SAGE and our best solution](https://arxiv.org/abs/2308.09435), Review EACL 2024
|
24 |
+
- [Path_to_model](https://huggingface.co/ai-forever/RuM2M100-1.2B)
|
25 |
+
|
26 |
|
27 |
### Examples
|
28 |
| Input | Output |
|
|
|
106 |
```
|
107 |
|
108 |
## Resources
|
109 |
+
- [SAGE library](https://github.com/orgs/ai-forever/sage), GitHub
|
110 |
- [ruM2M100-1.2B](https://huggingface.co/ai-forever/RuM2M100-1.2B), HuggingFace
|
111 |
- [ruM2M100-418M](https://huggingface.co/ai-forever/RuM2M100-420M), HuggingFace
|
112 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
|
|
114 |
|
115 |
## License
|
116 |
Model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
117 |
+
Our solution also comes with MIT license.
|
118 |
|
119 |
## Specifications
|
120 |
- File size: 5 Gb;
|
|
|
124 |
- Developer: SberDevices, AGI NLP
|
125 |
|
126 |
## Contacts
|
127 |
+
nikita.martynov.98@list.ru
|