pedro123483
/

results

@@ -11,6 +11,89 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # results
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Dataset Utilizado
+O modelo foi treinado utilizando o dataset IMDB, amplamente utilizado para tarefas de classificação de texto, especialmente para análise de sentimentos. O dataset contém 50.000 revisões de filmes rotuladas, divididas igualmente entre revisões positivas e negativas, com 25.000 exemplos para treinamento e 25.000 para teste.
+Para carregar o dataset, é preciso utilizar a biblioteca datasets da Hugging Face:
+from datasets import load_dataset
+dataset = load_dataset("imdb")
+# Como Treinar o Modelo
+1. Carregar o dataset:
+   from datasets import load_dataset
+   dataset = load_dataset("imdb")
+2. Pré-processamento:
+   from transformers import AutoTokenizer
+   tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
+   tokenized_datasets = dataset.map(lambda x: tokenizer(x['text'], padding='max_length', truncation=True), batched=True)
+3. Definir o Modelo e Argumentos de Treinamento:
+   from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
+   import numpy as np
+   model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
+   training_args = TrainingArguments(
+        output_dir="./results",
+        learning_rate=2e-5,
+        per_device_train_batch_size=32,
+        per_device_eval_batch_size=32,
+        num_train_epochs=1,
+        weight_decay=0.01,
+        evaluation_strategy="epoch",
+        push_to_hub=True
+    )
+    def compute_metrics(eval_pred):
+        logits, labels = eval_pred
+        predictions = np.argmax(logits, axis=-1)
+        return {"accuracy": (predictions == labels).mean()}
+4. Treinamento:
+   small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
+   small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(100))
+   trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=small_train_dataset,
+        eval_dataset=small_eval_dataset,
+        compute_metrics=compute_metrics
+    )
+   trainer.train()
+# Como Utilizar o Modelo
+Usando uma Pipeline:
+  from transformers import pipeline
+  pipe = pipeline("text-classification", model="pedro123483/results")
+  result = pipe("I loved this movie! It was fantastic and thrilling.")
+  print(result)
+Carregando o Modelo Diretamente:
+  from transformers import AutoTokenizer, AutoModelForSequenceClassification
+  import numpy as np
+  tokenizer = AutoTokenizer.from_pretrained("pedro123483/results")
+  model = AutoModelForSequenceClassification.from_pretrained("pedro123483/results")
+  inputs = tokenizer("I loved this movie! It was fantastic and thrilling.", return_tensors="pt")
+  outputs = model(**inputs)
+  predictions = np.argmax(outputs.logits.detach().numpy(), axis=-1)
+  print(predictions)
 # results
 This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.