File size: 5,340 Bytes
120d484 a8fc3f8 120d484 e3b7f9d e8e3a2f ac3eb01 120d484 0f25402 b5740b7 02b1c37 0f25402 e8e3a2f 0f25402 b8d46cf 0f25402 32d7ccb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
language:
- en
tags: summarization
datasets:
- xsum
metrics:
- rouge
widget:
- text: "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
---
### Pegasus for Financial Summarization
This model was trained on a novel financial dataset which consists of 2K financial and economic articles from the [Bloomberg](https://www.bloomberg.com/europe) website of different categories such as stock, markets, currencies, rate and cryptocurrences, using [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html). This model is fine-tuned on the [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum).
PEGASUS model was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
### How to use
We provide a simple snippet of how to use this model for the task of financial summarization in Pytorch.
```Python
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
# Let's load the model and the tokenizer
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
# just replace with TFPegasusForConditionalGeneration
# Some text to summarize here
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
# Tokenize our text
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
# Generate the output (Here, we use beam search but you can also use any other strategy you like)
output = model.generate(
input_ids,
max_length=32,
num_beams=5,
early_stopping=True
)
# Finally, we can print the generated summary
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
```
## Evaluation Results
The results before and after the fine-tuning on our dataset are shown below:
| Fine-tuning | R-1 | R-2 | R-L | R-S |
|:-----------:|:-----:|:-----:|:------:|:-----:|
| Yes | 23.55 | 6.99 | 18.14 | 21.36 |
| No | 13.8 | 2.4 | 10.63 | 12.03 |
## Citation
You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:
> T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas.
> Towards Human-Centered Summarization: A Case Study on Financial News.
> In Proceedings of the Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP) Workshop at EACL (to appear). 2O21.
BibTeX entry:
```
@inproceedings{humancentered2021,
title={Towards Human-Centered Summarization: A Case Study on Financial News},
author={Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios},
booktitle={Proceedings of the Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP) Workshop at EACL },
pages={N/A},
year={2021}
}
```
|