IlyaGusev commited on
Commit
2127fb1
1 Parent(s): 4edcbe9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ language:
4
+ - ru
5
+ tags:
6
+ - summarization
7
+ license: apache-2.0
8
+
9
+ ---
10
+
11
+ # RuT5TelegramHeadlines
12
+
13
+ ## Model description
14
+
15
+ Based on [rut5-base](https://huggingface.co/cointegrated/rut5-base) model
16
+
17
+ ## Intended uses & limitations
18
+
19
+ #### How to use
20
+
21
+ ```python
22
+ from transformers import AutoTokenizer, T5ForConditionalGeneration
23
+
24
+ model_name = "IlyaGusev/rut5_telegram_headlines"
25
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
26
+ model = T5ForConditionalGeneration.from_pretrained(model_name)
27
+
28
+ article_text = "..."
29
+
30
+ input_ids = tokenizer(
31
+ [article_text],
32
+ max_length=600,
33
+ add_special_tokens=True,
34
+ padding="max_length",
35
+ truncation=True,
36
+ return_tensors="pt"
37
+ )["input_ids"]
38
+
39
+ output_ids = model.generate(
40
+ input_ids=input_ids,
41
+ no_repeat_ngram_size=4
42
+ )[0]
43
+
44
+ headline = tokenizer.decode(output_ids, skip_special_tokens=True)
45
+ print(headline)
46
+ ```
47
+
48
+ ## Training data
49
+
50
+ - Dataset: [ru_all_split.tar.gz](https://www.dropbox.com/s/ykqk49a8avlmnaf/ru_all_split.tar.gz)
51
+
52
+ ## Training procedure
53
+
54
+ - Training script: [train.py](https://github.com/IlyaGusev/summarus/blob/master/external/hf_scripts/train.py)