Edit model card

Dataset Utilizado

O modelo foi treinado utilizando o dataset IMDB, amplamente utilizado para tarefas de classificação de texto, especialmente para análise de sentimentos. O dataset contém 50.000 revisões de filmes rotuladas, divididas igualmente entre revisões positivas e negativas, com 25.000 exemplos para treinamento e 25.000 para teste.

Para carregar o dataset, é preciso utilizar a biblioteca datasets da Hugging Face:

from datasets import load_dataset dataset = load_dataset("imdb")

Como Treinar o Modelo

  1. Carregar o dataset:

    from datasets import load_dataset dataset = load_dataset("imdb")

  2. Pré-processamento:

    from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenized_datasets = dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True), batched=True)

  3. Definir o Modelo e Argumentos de Treinamento:

    from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer import numpy as np

    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

    training_args = TrainingArguments( output_dir="./results", learning_rate=2e-5, per_device_train_batch_size=32, per_device_eval_batch_size=32, num_train_epochs=1, weight_decay=0.01, evaluation_strategy="epoch", push_to_hub=True )

    def compute_metrics(eval_pred): logits, labels = eval_pred predictions = np.argmax(logits, axis=-1) return {"accuracy": (predictions == labels).mean()}

  4. Treinamento:

    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000)) small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(100))

    trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics )

    trainer.train()

Como Utilizar o Modelo

Usando uma Pipeline: from transformers import pipeline

pipe = pipeline("text-classification", model="pedro123483/results")

result = pipe("I loved this movie! It was fantastic and thrilling.") print(result)

Carregando o Modelo Diretamente: from transformers import AutoTokenizer, AutoModelForSequenceClassification import numpy as np

tokenizer = AutoTokenizer.from_pretrained("pedro123483/results") model = AutoModelForSequenceClassification.from_pretrained("pedro123483/results")

inputs = tokenizer("I loved this movie! It was fantastic and thrilling.", return_tensors="pt") outputs = model(**inputs) predictions = np.argmax(outputs.logits.detach().numpy(), axis=-1) print(predictions)

results

This model is a fine-tuned version of distilbert-base-uncased on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1.0 32 0.6623 0.7

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
67M params
Tensor type
F32
·

Finetuned from