What libraries can I use for Summarization?

The transformersand transformers.js libraries are compatible with Summarization.

What models can I use for Summarization?

The facebook/bart-large-cnnand Falconsai/medical_summarization models can be used for Summarization.

What datasets can I use for Summarization?

The mlsumand samsum datasets can be used for Summarization.

What metrics can I use for Summarization?

The and rouge metric can be used for Summarization.

Tasks

Summarization

Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.

Inputs

Input

The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. It was the first structure to reach a height of 300 metres. Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.

Summarization Model

Output

The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. It was the first structure to reach a height of 300 metres.

About Summarization

Use Cases

Research Paper Summarization 🧐

Research papers can be summarized to allow researchers to spend less time selecting which articles to read. There are several approaches you can take for a task like this:

Use an existing extractive summarization model on the Hub to do inference.
Pick an existing language model trained for academic papers. This model can then be trained in a process called fine-tuning so it can solve the summarization task.
Use a sequence-to-sequence model like T5 for abstractive text summarization.

Inference

You can use the 🤗 Transformers library summarization pipeline to infer with existing Summarization models. If no model name is provided the pipeline will be initialized with sshleifer/distilbart-cnn-12-6.

from transformers import pipeline

classifier = pipeline("summarization")
classifier("Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.")
## [{ "summary_text": " Paris is the capital and most populous city of France..." }]

You can use huggingface.js to infer summarization models on Hugging Face Hub.

import { InferenceClient } from "@huggingface/inference";

const inference = new InferenceClient(HF_TOKEN);
const inputs =
    "Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles). The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.";

await inference.summarization({
    model: "sshleifer/distilbart-cnn-12-6",
    inputs,
});

Useful Resources

Would you like to learn more about the topic? Awesome! Here you can find some curated resources that you may find helpful!

Notebooks

Scripts for training

Documentation

Summarization task guide

Deploy on Inference Endpoints

Compatible libraries

Transformers

Transformers.js

using facebook/bart-large-cnn

Models for Summarization

Browse Models (2,432)

facebook/bart-large-cnn

Summarization • 0.4B • Updated Feb 13, 2024 • 4.19M • • 1.44k

Note A strong summarization model trained on English news articles. Excels at generating factual summaries.

Falconsai/medical_summarization

Summarization • 0.1B • Updated Jan 20, 2024 • 8.85k • • 135

Note A summarization model trained on medical articles.

Datasets for Summarization

Browse Datasets (2,416)

No example dataset is defined for this task.

Note Contribute by proposing a dataset for this task !

Spaces using Summarization

📚

pszemraj/summarize-long-text

Note An application that can summarize long paragraphs.

📝

ml6team/distilbart-tos-summarizer-tosdr

Note A much needed summarization application for terms and conditions.

🌖

pszemraj/document-summarization

Note An application that summarizes long documents.

💡

ml6team/post-processing-summarization

Note An application that can detect errors in abstractive summarization.

Metrics for Summarization

rouge: The generated sequence is compared against its summary, and the overlap of tokens are counted. ROUGE-N refers to overlap of N subsequent tokens, ROUGE-1 refers to overlap of single tokens and ROUGE-2 is the overlap of two subsequent tokens.