ai-forever
/

RuM2M100-1.2B

Text2Text Generation

natural language generation

Inference Endpoints

Model card Files Files and versions Community

ai-forever commited on Aug 31, 2023

Commit

1056360

•

1 Parent(s): 6b1add2

Update README.md

Files changed (1) hide show

README.md +21 -3

README.md CHANGED Viewed

@@ -56,7 +56,7 @@ We compare our solution with both open automatic spell checkers and the ChatGPT
 | HunSpell | 16.2 | 40.1 | 23.0 |
 **MedSpellChecker**
-| Модель | Precision | Recall | F1 |
 | --- | --- | --- | --- |
 | M2M100-1.2B | 63.7 | 57.8 | 60.6 |
 | ChatGPT gpt-3.5-turbo-0301 | 53.2 | 67.6 | 59.6 |
@@ -67,7 +67,7 @@ We compare our solution with both open automatic spell checkers and the ChatGPT
 | HunSpell | 10.3 | 40.2 | 16.4 |
 **GitHubTypoCorpusRu**
-| Модель | Precision | Recall | F1 |
 | --- | --- | --- | --- |
 | M2M100-1.2B | 45.7 | 41.4 | 43.5 |
 | ChatGPT gpt-3.5-turbo-0301 | 43.8 | 57.0 | 49.6 |
@@ -75,4 +75,22 @@ We compare our solution with both open automatic spell checkers and the ChatGPT
 | ChatGPT text-davinci-003 | 46.5 | 58.1 | 51.7 |
 | Yandex.Speller | 67.7 | 37.5 | 48.3 |
 | JamSpell | 49.5 | 29.9 | 37.3 |
-| HunSpell | 28.5 | 30.7 | 29.6 |

 | HunSpell | 16.2 | 40.1 | 23.0 |
 **MedSpellChecker**
+| Model | Precision | Recall | F1 |
 | --- | --- | --- | --- |
 | M2M100-1.2B | 63.7 | 57.8 | 60.6 |
 | ChatGPT gpt-3.5-turbo-0301 | 53.2 | 67.6 | 59.6 |
 | HunSpell | 10.3 | 40.2 | 16.4 |
 **GitHubTypoCorpusRu**
+| Model | Precision | Recall | F1 |
 | --- | --- | --- | --- |
 | M2M100-1.2B | 45.7 | 41.4 | 43.5 |
 | ChatGPT gpt-3.5-turbo-0301 | 43.8 | 57.0 | 49.6 |
 | ChatGPT text-davinci-003 | 46.5 | 58.1 | 51.7 |
 | Yandex.Speller | 67.7 | 37.5 | 48.3 |
 | JamSpell | 49.5 | 29.9 | 37.3 |
+| HunSpell | 28.5 | 30.7 | 29.6 |
+## How to use
+from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
+path_to_model = "<path_to_model>"
+model = M2M100ForConditionalGeneration.from_pretrained(path_to_model)
+tokenizer = M2M100Tokenizer.from_pretrained(path_to_model)
+sentence = "прийдя в МГТУ я был удивлен никого необноружив там…"
+encodings = tokenizer(sentence, return_tensors="pt")
+generated_tokens = model.generate(
+        **encodings, forced_bos_token_id=tokenizer.get_lang_id("ru"))
+answer = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
+print(answer)
+# ["прийдя в МГТУ я был удивлен никого не обнаружив там..."]