Back to all models

⚠️ Model card error ⚠️

Invalid YAML. You can check your YAML's validity using this online tool .

translation mask_token:
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚡️ Upgrade your account to access the Inference API

							$
							curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
https://api-inference.huggingface.co/models/paulowoicho/t5-podcast-summarisation
Share Copied link to clipboard

Monthly model downloads

paulowoicho/t5-podcast-summarisation paulowoicho/t5-podcast-summarisation
66 downloads
last 30 days

pytorch

tf

Contributed by

paulowoicho Paul Owoicho
1 model

How to use this model directly from the 🤗/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("paulowoicho/t5-podcast-summarisation") model = AutoModelWithLMHead.from_pretrained("paulowoicho/t5-podcast-summarisation")
Uploaded in S3

T5 for Automatic Podcast Summarisation

This model is the result of fine-tuning t5-base on the Spotify Podcast Dataset.

It is based on Google's T5 which was pretrained on the C4 dataset.

Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

Intended uses & limitations

This model is intended to be used for automatic podcast summarisation. As creator provided descriptions were used for training, the model also learned to generate promotional material (links, hashtags, etc) in its summaries, as such some post processing may be required on the model's outputs.

If using on Colab, the instance will crash if the number of tokens in the transcript exceeds 7000. I discovered that the model generated reasonable summaries even when the podcast transcript was truncated to reduce the number of tokens.

How to use

The model can be used with the summarisation as follows:

from transformers import pipeline

summarizer = pipeline("summarization", model="paulowoicho/t5-podcast-summarisation", tokenizer="paulowoicho/t5-podcast-summarisation")
summary = summarizer(podcast_transcript, min_length=5, max_length=20)

print(summary[0]['summary_text'])

Training data

This model is the result of fine-tuning t5-base on the Spotify Podcast Dataset. Pre-processing was done on the original data before fine-tuning.

Training procedure

Training was largely based on Fine-tune T5 for Summarization by Abhishek Kumar Mishra