File size: 4,070 Bytes

---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- generated_from_trainer
model-index:
- name: results
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Dataset Utilizado

O modelo foi treinado utilizando o dataset IMDB, amplamente utilizado para tarefas de classificação de texto, especialmente para análise de sentimentos. O dataset contém 50.000 revisões de filmes rotuladas, divididas igualmente entre revisões positivas e negativas, com 25.000 exemplos para treinamento e 25.000 para teste.

Para carregar o dataset, é preciso utilizar a biblioteca datasets da Hugging Face:

from datasets import load_dataset
dataset = load_dataset("imdb")

# Como Treinar o Modelo

1. Carregar o dataset:
   
   from datasets import load_dataset
   dataset = load_dataset("imdb")
   
2. Pré-processamento:
   
   from transformers import AutoTokenizer
   tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
   tokenized_datasets = dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True), batched=True)

3. Definir o Modelo e Argumentos de Treinamento:

   from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
   import numpy as np
    
   model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

   training_args = TrainingArguments(
        output_dir="./results",
        learning_rate=2e-5,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=32,
        num_train_epochs=1,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        push_to_hub=True
    )
    
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return {"accuracy": (predictions == labels).mean()}

4. Treinamento:

   small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
   small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(100))
    
   trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics
    )
    
   trainer.train()

# Como Utilizar o Modelo

Usando uma Pipeline:
  from transformers import pipeline

  pipe = pipeline("text-classification", model="pedro123483/results")
  
  result = pipe("I loved this movie! It was fantastic and thrilling.")
  print(result)

Carregando o Modelo Diretamente:
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
  import numpy as np
  
  tokenizer = AutoTokenizer.from_pretrained("pedro123483/results")
  model = AutoModelForSequenceClassification.from_pretrained("pedro123483/results")
  
  inputs = tokenizer("I loved this movie! It was fantastic and thrilling.", return_tensors="pt")
  outputs = model(**inputs)
  predictions = np.argmax(outputs.logits.detach().numpy(), axis=-1)
  print(predictions)


# results

This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| No log        | 1.0   | 32   | 0.6623          | 0.7      |


### Framework versions

- Transformers 4.41.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1