T5 for Automatic Podcast Summarisation
Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
Intended uses & limitations
This model is intended to be used for automatic podcast summarisation. As creator provided descriptions were used for training, the model also learned to generate promotional material (links, hashtags, etc) in its summaries, as such some post processing may be required on the model's outputs.
If using on Colab, the instance will crash if the number of tokens in the transcript exceeds 7000. I discovered that the model generated reasonable summaries even when the podcast transcript was truncated to reduce the number of tokens.
How to use
The model can be used with the summarisation as follows:
from transformers import pipeline summarizer = pipeline("summarization", model="paulowoicho/t5-podcast-summarisation", tokenizer="paulowoicho/t5-podcast-summarisation") summary = summarizer(podcast_transcript, min_length=5, max_length=20) print(summary['summary_text'])
- Downloads last month