--- license: other datasets: - raicrits/YouTube_RAI_dataset language: - it pipeline_tag: text-classification tags: - LLM - Italian - Classification - BERT - Topics library_name: transformers --- --- # Model Card raicrits/BERT_ChangeOfTopic [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) finetuned to be capable of detecting a change of topic in a given text. ### Model Description The model is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise. The training has been done using the chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset). - **Developed by:** Stefano Scotta (stefano.scotta@rai.it) - **Model type:** LLM finetuned on the specific task of detect a change of topic in a given text - **Language(s) (NLP):** Italian - **License:** unknown - **Finetuned from model [optional]:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) ## Uses The model can be used to check if in a given text occurs a change of topic or not. ## How to Get Started with the Model Use the code below to get started with the model. **Usage:** Use the code below to get started with the model. ``` python import torch from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline model_bert = torch.load('raicrits/BERT_ChangeOfTopic') model_bert = model_bert.to(device_bert) tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased') encoded_dict = tokenizer_bert.encode_plus( '', add_special_tokens = True, max_length = 256, # max_length = min(max_len, 512), truncation = True, padding='max_length', return_attention_mask = True, return_tensors = 'pt', ) input_ids = encoded_dict['input_ids'].to(device_bert) input_mask = encoded_dict['attention_mask'].to(device_bert) with torch.no_grad(): output= model_bert(input_ids, token_type_ids=None, attention_mask=input_mask) logits = output.logits logits = logits.detach().cpu().numpy() pred_flat = np.argmax(logits, axis=1).flatten() print(pred_flat[0]) ``` ## Training Details ### Training Data Chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset) ### Training Procedure **Training setting:** - train epochs=18, - learning_rate=2e-05 ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** 1 NVIDIA A100/40Gb - **Hours used:** 20 - **Cloud Provider:** Private Infrastructure - **Carbon Emitted:** 2.38kg eq. CO2 ## Model Card Authors Stefano Scotta (stefano.scotta@rai.it) ## Model Card Contact stefano.scotta@rai.it