Binaryy commited on
Commit
e723bbd
1 Parent(s): 3a3619d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -149
README.md CHANGED
@@ -1,149 +0,0 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - summarization
6
- datasets:
7
- - xsum
8
- metrics:
9
- - rouge
10
- widget:
11
- - text: National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed
12
- to buy rival Samba Financial Group for $15 billion in the biggest banking takeover
13
- this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to
14
- a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer
15
- 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio
16
- the banks set when they signed an initial framework agreement in June.The offer
17
- is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24%
18
- higher than the level the shares traded at before the talks were made public.
19
- Bloomberg News first reported the merger discussions.The new bank will have total
20
- assets of more than $220 billion, creating the Gulf region’s third-largest lender.
21
- The entity’s $46 billion market capitalization nearly matches that of Qatar National
22
- Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion
23
- of assets.
24
- model-index:
25
- - name: human-centered-summarization/financial-summarization-pegasus
26
- results:
27
- - task:
28
- type: summarization
29
- name: Summarization
30
- dataset:
31
- name: xsum
32
- type: xsum
33
- config: default
34
- split: test
35
- metrics:
36
- - type: rouge
37
- value: 35.2055
38
- name: ROUGE-1
39
- verified: true
40
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw
41
- - type: rouge
42
- value: 16.5689
43
- name: ROUGE-2
44
- verified: true
45
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg
46
- - type: rouge
47
- value: 30.1285
48
- name: ROUGE-L
49
- verified: true
50
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg
51
- - type: rouge
52
- value: 30.1706
53
- name: ROUGE-LSUM
54
- verified: true
55
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA
56
- - type: loss
57
- value: 2.7092134952545166
58
- name: loss
59
- verified: true
60
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg
61
- - type: gen_len
62
- value: 15.1414
63
- name: gen_len
64
- verified: true
65
- verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA
66
- ---
67
-
68
- ### PEGASUS for Financial Summarization
69
-
70
- This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies.
71
-
72
- It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
73
-
74
- ### How to use
75
- We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.
76
-
77
- ```Python
78
- from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
79
-
80
- # Let's load the model and the tokenizer
81
- model_name = "human-centered-summarization/financial-summarization-pegasus"
82
- tokenizer = PegasusTokenizer.from_pretrained(model_name)
83
- model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
84
- # just replace with TFPegasusForConditionalGeneration
85
-
86
-
87
- # Some text to summarize here
88
- text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
89
-
90
- # Tokenize our text
91
- # If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
92
- input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
93
-
94
- # Generate the output (Here, we use beam search but you can also use any other strategy you like)
95
- output = model.generate(
96
- input_ids,
97
- max_length=32,
98
- num_beams=5,
99
- early_stopping=True
100
- )
101
-
102
- # Finally, we can print the generated summary
103
- print(tokenizer.decode(output[0], skip_special_tokens=True))
104
- # Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
105
- ```
106
-
107
- ## Evaluation Results
108
- The results before and after the fine-tuning on our dataset are shown below:
109
-
110
-
111
- | Fine-tuning | R-1 | R-2 | R-L | R-S |
112
- |:-----------:|:-----:|:-----:|:------:|:-----:|
113
- | Yes | 23.55 | 6.99 | 18.14 | 21.36 |
114
- | No | 13.8 | 2.4 | 10.63 | 12.03 |
115
-
116
-
117
- ## Citation
118
-
119
- You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:
120
-
121
- > T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021.
122
- > Towards Human-Centered Summarization: A Case Study on Financial News.
123
- > In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
124
-
125
- BibTeX entry:
126
-
127
- ```
128
- @inproceedings{passali-etal-2021-towards,
129
- title = "Towards Human-Centered Summarization: A Case Study on Financial News",
130
- author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
131
- booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
132
- month = apr,
133
- year = "2021",
134
- address = "Online",
135
- publisher = "Association for Computational Linguistics",
136
- url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
137
- pages = "21--27",
138
- }
139
- ```
140
-
141
- ## Support
142
-
143
- Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!
144
-
145
- More information about Medoid AI:
146
- - Website: [https://www.medoid.ai](https://www.medoid.ai)
147
- - LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)
148
-
149
-