ai-forever
/

sage-fredt5-large

@@ -155,26 +155,36 @@ We compare our solution with both open automatic spell checkers and the ChatGPT
 | Model | Pr. (spell) | Rec. (spell) | F1 (spell) | Pr. (punc) | Rec. (punc) | F1 (punc) | Pr. (case) | Rec. (case) | F1 (case) |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | sage-fredt5-large | x | x | x | x | x | x | x | x | x |
-| sage-fredt5-large (ft) | 88.4 | 80.9 | 84.5 | 88.2 | 85.3 | 86.8 | 95.5 | 94.0 | 94.7 |
-| sage-ai-service | 90.3 | 86.3 | 88.2 | 90.3 | 86.6 | 88.4 | 95.2 | 95.9 | 95.6 |
-| gpt-3.5-turbo | 33.6 | 58.5 | 42.7 | 85.9 | 64.6 | 73.7 | 84.9 | 73.9 | 79.0 |
-| gpt-4 | 54.9 | 76.7 | 64.0 | 84.0 | 82.3 | 83.2 | 91.5 | 90.2 | 90.9 |
-& \textbf{70.8} & \textbf{56.3} & \textbf{62.7} & \textbf{48.9} & 35.8 & 41.4 & 32.9 & 45.3 & 38.1
 ## How to use
 ```python
-from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
-path_to_model = "ai-forever/RuM2M100-1.2B"
-model = M2M100ForConditionalGeneration.from_pretrained(path_to_model)
-tokenizer = M2M100Tokenizer.from_pretrained(path_to_model, src_lang="ru", tgt_lang="ru")
-sentence = "прийдя в МГТУ я был удивлен никого необноружив там…"
-encodings = tokenizer(sentence, return_tensors="pt")
-generated_tokens = model.generate(
-        **encodings, forced_bos_token_id=tokenizer.get_lang_id("ru"))
-answer = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
-print(answer)
-#["прийдя в МГТУ я был удивлен никого не обнаружив там..."]
 ```
 ## Resources

 | Model | Pr. (spell) | Rec. (spell) | F1 (spell) | Pr. (punc) | Rec. (punc) | F1 (punc) | Pr. (case) | Rec. (case) | F1 (case) |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | sage-fredt5-large | x | x | x | x | x | x | x | x | x |
+| sage-fredt5-large (ft) | 67.5 | 53.2 | 59.5 | 48.5  | 38.0 | 42.6 | 37.3 | 50.0 | 42.7 |
+| sage-ai-service | 70.8 | 56.3 | 62.7 | 48.9 | 35.8 | 41.4 | 32.9 | 45.3 | 38.1|
+| gpt-3.5-turbo | 23.7 | 38.7 | 29.4 | 37.6 | 23.3 | 28.7 | 19.6 | 35.9 | 25.3 |
+| gpt-4 | 27.0 | 52.8 | 35.7 | 45.9 | 32.6 | 38.2 | 25.7 | 36.8 | 30.2 |
 ## How to use
 ```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("ai-forever/sage-fredt5-large")
+model = AutoModelForSeq2SeqLM.from_pretrained("ai-forever/sage-fredt5-large")
+model.to("cuda:0")
+sentence = "И не чсно прохожим в этот день непогожйи почему я веселый такйо"
+text = "<LM>" + sentence
+with torch.inference_mode():
+    encodings = tokenizer(text, max_length=None, padding="longest", truncation=False, return_tensors="pt")
+    for k, v in encodings.items():
+        encodings[k] = v.to("cuda:0")
+    res = model.generate(
+        **encodings,
+        use_cache=True,
+        max_length = encodings["input_ids"].size(1) * 1.5
+    )
+    res = res.cpu().tolist()
+    res = tokenizer.batch_decode(res, skip_special_tokens=True)
+print(res)
+# ["И не ясно прохожим в этот день непогожий, почему я веселый такой."]
 ```
 ## Resources