General Information
This is a bert-base-cased
, binary classification model, fine-tuned to classify a given sentence as containing advertising content or not. It leverages previous-sentence context to make more accurate predictions.
The model is used in the paper 'Leveraging multimodal content for podcast summarization' published at ACM SAC 2022.
Usage:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('morenolq/spotify-podcast-advertising-classification')
tokenizer = AutoTokenizer.from_pretrained('morenolq/spotify-podcast-advertising-classification')
desc_sentences = ["Sentence 1", "Sentence 2", "Sentence 3"]
for i, s in enumerate(desc_sentences):
if i==0:
context = "__START__"
else:
context = desc_sentences[i-1]
out = tokenizer(context, s, padding = "max_length",
max_length = 256,
truncation=True,
return_attention_mask=True,
return_tensors = 'pt')
outputs = model(**out)
print (f"{s},{outputs}")
The manually annotated data, used for model fine-tuning are available here
Hereafter is the classification report of the model evaluation on the test split:
precision recall f1-score support
0 0.95 0.93 0.94 256
1 0.88 0.91 0.89 140
accuracy 0.92 396
macro avg 0.91 0.92 0.92 396
weighted avg 0.92 0.92 0.92 396
If you find it useful, please cite the following paper:
@inproceedings{10.1145/3477314.3507106,
author = {Vaiani, Lorenzo and La Quatra, Moreno and Cagliero, Luca and Garza, Paolo},
title = {Leveraging Multimodal Content for Podcast Summarization},
year = {2022},
isbn = {9781450387132},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477314.3507106},
doi = {10.1145/3477314.3507106},
booktitle = {Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing},
pages = {863โ870},
numpages = {8},
keywords = {multimodal learning, multimodal features fusion, extractive summarization, deep learning, podcast summarization},
location = {Virtual Event},
series = {SAC '22}
}
- Downloads last month
- 127
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.