human-centered-summarization's picture
Update README.md
e3b7f9d
|
raw
history blame
4.87 kB
metadata
language:
  - en
tags: summarization
license: mit
datasets:
  - xsum
metrics:
  - rouge
widget:
  - text: Which name is also used to describe the Amazon rainforest in English?
    context: >-
      National Commercial BankNational Commercial Bank (NCB), Saudi Arabia’s
      largest lender by assets, agreed to buy rival Samba Financial Group for
      $15 billion in the biggest banking takeover this year.NCB will pay 28.45
      riyals ($7.58) for each Samba share, according to a statement on Sunday,
      valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares
      for each Samba share, at the lower end of the 0.736-0.787 ratio the banks
      set when they signed an initial framework agreement in June.The offer is a
      3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24%
      higher than the level the shares traded at before the talks were made
      public. Bloomberg News first reported the merger discussions.The new bank
      will have total assets of more than $220 billion, creating the Gulf
      region’s third-largest lender. The entity’s $46 billion market
      capitalization nearly matches that of Qatar National Bank QPSC, which is
      still the Middle East’s biggest lender with about $268 billion of assets.

Pegasus for Financial Summarization

This model was trained on a novel financial dataset which consists of 2K financial and economic articles from the Bloomberg website of different categories such as stock, markets, currencies, rate and cryptocurrences, using PEGASUS. This model is fine-tuned on the google/pegasus-xsum model.

PEGASUS model was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization.

Installing

In order to use this model, you have to install Transformers as follows:

pip install transformers

How to use

We provide a simple snippet of how to use this model for the task of financial summarization in Pytorch. Don't worry if you prefer Tensorflow, you can also run the following code only with minor changes.

from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

# Let's load the model and the tokenizer 
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model 
                                                                    # just replace with TFPegasusForConditionalGeneration


# Some text to summarize here
text_to_summarize = "National Commercial BankNational Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."

# Tokenize our text
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

# Generate the output (Here, we use beam search but you can also use any other strategy you like)
output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

# Finally, we can print the generated summary
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion

Evaluation Results

The results before and after the fine-tuning on our dataset are shown below:

Fine-tuning R-1 R-2 R-L R-S
Yes 23.55 6.99 18.14 21.36
No 13.8 2.4 10.63 12.03