File size: 1,424 Bytes
05e85ad 4da6182 05e85ad db5798c 66be958 db5798c da963e1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
language: "en"
datasets:
- Spotify Podcasts Dataset
tags:
- bert
- classification
- pytorch
pipeline:
- text-classification
---
**General Information**
This is a `bert-base-cased`, binary classification model, fine-tuned to classify a given sentence as containing advertising content or not. It leverages previous-sentence context to make more accurate predictions.
The model is used in the paper 'Leveraging multimodal content for podcast summarization' published at ACM SAC 2022.
**Usage:**
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained('morenolq/spotify-podcast-advertising-classification')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
desc_sentences = ["Sentence 1", "Sentence 2", "Sentence 3"]
for i, s in enumerate(desc_sentences):
if i==0:
context = "__START__"
else:
context = desc_sentences[i-1]
out = tokenizer(context, text, padding = "max_length",
max_length = 256,
truncation=True,
return_attention_mask=True,
return_tensors = 'pt')
outputs = model(**out)
print (f"{s},{outputs}")
```
The manually annotated data, used for model fine-tuning are available [here](https://github.com/MorenoLaQuatra/MATeR/blob/main/description_sentences_classification.tsv) |