ai-forever commited on
Commit
0c8f4e6
1 Parent(s): f1f37bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -17
README.md CHANGED
@@ -155,26 +155,36 @@ We compare our solution with both open automatic spell checkers and the ChatGPT
155
  | Model | Pr. (spell) | Rec. (spell) | F1 (spell) | Pr. (punc) | Rec. (punc) | F1 (punc) | Pr. (case) | Rec. (case) | F1 (case) |
156
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
157
  | sage-fredt5-large | x | x | x | x | x | x | x | x | x |
158
- | sage-fredt5-large (ft) | 88.4 | 80.9 | 84.5 | 88.2 | 85.3 | 86.8 | 95.5 | 94.0 | 94.7 |
159
- | sage-ai-service | 90.3 | 86.3 | 88.2 | 90.3 | 86.6 | 88.4 | 95.2 | 95.9 | 95.6 |
160
- | gpt-3.5-turbo | 33.6 | 58.5 | 42.7 | 85.9 | 64.6 | 73.7 | 84.9 | 73.9 | 79.0 |
161
- | gpt-4 | 54.9 | 76.7 | 64.0 | 84.0 | 82.3 | 83.2 | 91.5 | 90.2 | 90.9 |
162
-
163
- & \textbf{70.8} & \textbf{56.3} & \textbf{62.7} & \textbf{48.9} & 35.8 & 41.4 & 32.9 & 45.3 & 38.1
164
 
165
  ## How to use
166
  ```python
167
- from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
168
- path_to_model = "ai-forever/RuM2M100-1.2B"
169
- model = M2M100ForConditionalGeneration.from_pretrained(path_to_model)
170
- tokenizer = M2M100Tokenizer.from_pretrained(path_to_model, src_lang="ru", tgt_lang="ru")
171
- sentence = "прийдя в МГТУ я был удивлен никого необноружив там…"
172
- encodings = tokenizer(sentence, return_tensors="pt")
173
- generated_tokens = model.generate(
174
- **encodings, forced_bos_token_id=tokenizer.get_lang_id("ru"))
175
- answer = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
176
- print(answer)
177
- #["прийдя в МГТУ я был удивлен никого не обнаружив там..."]
 
 
 
 
 
 
 
 
 
 
 
 
178
  ```
179
 
180
  ## Resources
 
155
  | Model | Pr. (spell) | Rec. (spell) | F1 (spell) | Pr. (punc) | Rec. (punc) | F1 (punc) | Pr. (case) | Rec. (case) | F1 (case) |
156
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
157
  | sage-fredt5-large | x | x | x | x | x | x | x | x | x |
158
+ | sage-fredt5-large (ft) | 67.5 | 53.2 | 59.5 | 48.5 | 38.0 | 42.6 | 37.3 | 50.0 | 42.7 |
159
+ | sage-ai-service | 70.8 | 56.3 | 62.7 | 48.9 | 35.8 | 41.4 | 32.9 | 45.3 | 38.1|
160
+ | gpt-3.5-turbo | 23.7 | 38.7 | 29.4 | 37.6 | 23.3 | 28.7 | 19.6 | 35.9 | 25.3 |
161
+ | gpt-4 | 27.0 | 52.8 | 35.7 | 45.9 | 32.6 | 38.2 | 25.7 | 36.8 | 30.2 |
 
 
162
 
163
  ## How to use
164
  ```python
165
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
166
+
167
+ tokenizer = AutoTokenizer.from_pretrained("ai-forever/sage-fredt5-large")
168
+ model = AutoModelForSeq2SeqLM.from_pretrained("ai-forever/sage-fredt5-large")
169
+
170
+ model.to("cuda:0")
171
+
172
+ sentence = "И не чсно прохожим в этот день непогожйи почему я веселый такйо"
173
+ text = "<LM>" + sentence
174
+ with torch.inference_mode():
175
+ encodings = tokenizer(text, max_length=None, padding="longest", truncation=False, return_tensors="pt")
176
+ for k, v in encodings.items():
177
+ encodings[k] = v.to("cuda:0")
178
+ res = model.generate(
179
+ **encodings,
180
+ use_cache=True,
181
+ max_length = encodings["input_ids"].size(1) * 1.5
182
+ )
183
+ res = res.cpu().tolist()
184
+ res = tokenizer.batch_decode(res, skip_special_tokens=True)
185
+ print(res)
186
+
187
+ # ["И не ясно прохожим в этот день непогожий, почему я веселый такой."]
188
  ```
189
 
190
  ## Resources