ai-forever
commited on
Commit
•
6dfaa60
1
Parent(s):
4b6a820
Update README.md
Browse files
README.md
CHANGED
@@ -17,10 +17,11 @@ The model corrects spelling errors and typos by bringing all the words in the te
|
|
17 |
The proofreader was trained based on the [M2M100-418M](https://huggingface.co/facebook/m2m100_418M) model.
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the functionality of the [SAGE] library (https://github.com /orgs/ai-forever/sage).
|
19 |
|
20 |
-
###
|
21 |
-
- [
|
22 |
-
- [
|
23 |
-
- [
|
|
|
24 |
|
25 |
### Examples
|
26 |
| Input | Output |
|
@@ -103,7 +104,7 @@ print(answer)
|
|
103 |
```
|
104 |
|
105 |
## Resources
|
106 |
-
- [SAGE library
|
107 |
- [ruM2M100-1.2B](https://huggingface.co/ai-forever/RuM2M100-1.2B), HuggingFace
|
108 |
- [ruM2M100-418M](https://huggingface.co/ai-forever/RuM2M100-420M), HuggingFace
|
109 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
@@ -111,7 +112,7 @@ print(answer)
|
|
111 |
|
112 |
## License
|
113 |
Model [M2M100-418M](https://huggingface.co/facebook/m2m100_418M), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
114 |
-
Our solution also comes with
|
115 |
|
116 |
## Specifications
|
117 |
- File size: 2 Gb;
|
@@ -121,4 +122,4 @@ Our solution also comes with an MIT license.
|
|
121 |
- Developer: SberDevices, AGI NLP
|
122 |
|
123 |
## Contacts
|
124 |
-
|
|
|
17 |
The proofreader was trained based on the [M2M100-418M](https://huggingface.co/facebook/m2m100_418M) model.
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the functionality of the [SAGE] library (https://github.com /orgs/ai-forever/sage).
|
19 |
|
20 |
+
### Public references
|
21 |
+
- [SAGE library announcement](https://youtu.be/yFfkV0Qjuu0), DataFest 2023
|
22 |
+
- [Paper about synthetic error generation methods](https://www.dialog-21.ru/media/5914/martynovnplusetal056.pdf), Dialogue 2023
|
23 |
+
- [Paper about SAGE and our best solution](https://arxiv.org/abs/2308.09435), Review EACL 2024
|
24 |
+
- [Path_to_model](https://huggingface.co/ai-forever/RuM2M100-418M)
|
25 |
|
26 |
### Examples
|
27 |
| Input | Output |
|
|
|
104 |
```
|
105 |
|
106 |
## Resources
|
107 |
+
- [SAGE library](https://github.com/orgs/ai-forever/sage), GitHub
|
108 |
- [ruM2M100-1.2B](https://huggingface.co/ai-forever/RuM2M100-1.2B), HuggingFace
|
109 |
- [ruM2M100-418M](https://huggingface.co/ai-forever/RuM2M100-420M), HuggingFace
|
110 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
|
|
112 |
|
113 |
## License
|
114 |
Model [M2M100-418M](https://huggingface.co/facebook/m2m100_418M), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
115 |
+
Our solution also comes with MIT license.
|
116 |
|
117 |
## Specifications
|
118 |
- File size: 2 Gb;
|
|
|
122 |
- Developer: SberDevices, AGI NLP
|
123 |
|
124 |
## Contacts
|
125 |
+
nikita.martynov.98@list.ru
|