Edit model card

bigbird pegasus on the booksum dataset

this is the "latest" version of the model that has been trained the longest, currently at 70k steps

  • GOAL: A summarization model that 1) summarizes the source content accurately 2) more important IMO produces summaries that are easy to read and understand (* cough * unlike arXiv * cough *)
    • This model attempts to help with that by using the booksum dataset to provide explanatory summarization
    • Explanatory Summary - A summary that both consolidates information and also explains why said consolidated information is important.
  • This model was trained for seven epochs total (approx 70,000 steps) and is closer to finished.
    • Will continue to improve (slowly, now that it has been trained for a long time) based on any result findings/feedback.
  • starting checkpoint was google/bigbird-pegasus-large-bigpatent

example usage

An extended example, including a demo of batch summarization, is here.

  • create the summarizer object:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline

model = AutoModelForSeq2SeqLM.from_pretrained(
    "pszemraj/bigbird-pegasus-large-K-booksum",
    low_cpu_mem_usage=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    "pszemraj/bigbird-pegasus-large-K-booksum",
)


summarizer = pipeline(
    "summarization",
    model=model,
    tokenizer=tokenizer,
)          
  • define text to be summarized, and pass it through the pipeline. Boom done.
wall_of_text = "your text to be summarized goes here."

result = summarizer(
    wall_of_text,
    min_length=16,
    max_length=256,
    no_repeat_ngram_size=3,
    clean_up_tokenization_spaces=True,
)

print(result[0]["summary_text"])

Alternate Checkpoint

  • if experiencing runtime/memory issues, try this earlier checkpoint at 40,000 steps which is almost as good at the explanatory summarization task but runs faster.
  • see similar summarization models fine-tuned on booksum but using different architectures: long-t5 base and LED-Large

Downloads last month
53
Safetensors
Model size
577M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/bigbird-pegasus-large-K-booksum

Evaluation results