File size: 3,741 Bytes

---
license: other
datasets:
- raicrits/YouTube_RAI_dataset
language:
- it
pipeline_tag: text-classification
tags:
- LLM
- Italian
- Classification
- BERT
- Topics
library_name: transformers
---

---

# Model Card raicrits/BERT_ChangeOfTopic

<!-- Provide a quick summary of what the model is/does. -->

[bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) finetuned to be capable of detecting
a change of topic in a given text. 


### Model Description


The model  is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise.
The training has been done using the chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset).


- **Developed by:** Stefano Scotta (stefano.scotta@rai.it)
- **Model type:** LLM finetuned on the specific task of detect a change of topic in a given text
- **Language(s) (NLP):** Italian
- **License:** unknown
- **Finetuned from model [optional]:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)


## Uses

The model can be used to check if in a given text occurs a change of topic or not.

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


## How to Get Started with the Model

Use the code below to get started with the model.

 **Usage:**
Use the code below to get started with the model.
 ``` python

import torch
from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline


model_bert = torch.load('raicrits/BERT_ChangeOfTopic')
model_bert = model_bert.to(device_bert)

tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')

encoded_dict = tokenizer_bert.encode_plus(
                    '<text>',                     
                    add_special_tokens = True, 
                    max_length = 256,
                  # max_length = min(max_len, 512),           
                    truncation = True,
                    padding='max_length',
                    return_attention_mask = True,
                    return_tensors = 'pt',
               )
input_ids = encoded_dict['input_ids'].to(device_bert)
input_mask = encoded_dict['attention_mask'].to(device_bert)
with torch.no_grad():        
    output= model_bert(input_ids, 
                           token_type_ids=None, 
                           attention_mask=input_mask)
    logits = output.logits
    logits = logits.detach().cpu().numpy()
    pred_flat = np.argmax(logits, axis=1).flatten()
print(pred_flat[0])
```

## Training Details

### Training Data

Chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset)

### Training Procedure


 **Training setting:**
- train epochs=18,

- learning_rate=2e-05


## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

- **Hardware Type:** 1 NVIDIA A100/40Gb
- **Hours used:** 20
- **Cloud Provider:** Private Infrastructure
- **Carbon Emitted:** 2.38kg eq. CO2

## Model Card Authors

Stefano Scotta (stefano.scotta@rai.it)

## Model Card Contact

stefano.scotta@rai.it