JulesBelveze commited on
Commit
ab66b01
1 Parent(s): 47147db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -1
README.md CHANGED
@@ -1,3 +1,67 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - summarization
6
+ - headline-generation
7
+ - text-generation
8
+ datasets:
9
+ - JulesBelveze/tldr_news
10
+ - dataset2
11
+ metrics:
12
+ - rouge1
13
+ - rouge2
14
+ - rougeL
15
+ - rougeLsum
16
+
17
  ---
18
+
19
+ # t5-small for headline generation
20
+
21
+ This model is a [t5-small](https://huggingface.co/t5-small) fine-tuned for headline generation using
22
+ the [JulesBelveze/tldr_news](https://huggingface.co/datasets/JulesBelveze/tldr_news) dataset.
23
+
24
+ ## Using this model
25
+ ```python
26
+ import re
27
+ from transformers import AutoTokenizer, T5ForConditionalGeneration
28
+
29
+ WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))
30
+
31
+ article_text = """US FCC commissioner Brendan Carr has asked Apple and Google to remove TikTok from their app stores. The video app is owned by Chinese company ByteDance. Carr claims that TikTok functions as a surveillance tool that harvests extensive amounts of personal and sensitive data from US citizens. TikTok says its data access approval process is overseen by a US-based security team and that data is only accessed on an as-needed basis under strict controls."""
32
+ model_name = "JulesBelveze/t5-small-headline-generator"
33
+
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ model = T5ForConditionalGeneration.from_pretrained(model_name)
36
+
37
+ input_ids = tokenizer(
38
+ [WHITESPACE_HANDLER(article_text)],
39
+ return_tensors="pt",
40
+ padding="max_length",
41
+ truncation=True,
42
+ max_length=384
43
+ )["input_ids"]
44
+
45
+ output_ids = model.generate(
46
+ input_ids=input_ids,
47
+ max_length=84,
48
+ no_repeat_ngram_size=2,
49
+ num_beams=4
50
+ )[0]
51
+
52
+ summary = tokenizer.decode(
53
+ output_ids,
54
+ skip_special_tokens=True,
55
+ clean_up_tokenization_spaces=False
56
+ )
57
+ print(summary)
58
+ ```
59
+
60
+ ## Evaluation
61
+
62
+ | Metric | Score |
63
+ |------------|---------|
64
+ | ROUGE 1 | 44.2379 |
65
+ | ROUGE 2 | 17.4961 |
66
+ | ROUGE L | 41.1119 |
67
+ | ROUGE Lsum | 41.1256 |