PeroVazPT-BR Classifier

Model Description

The PeroVazPT-BR Classifier is designed to classify text between European Portuguese (PT) and Brazilian Portuguese (BR).

This model is a fine-tuned version of prajjwal1/bert-tiny on the VeraCruz Dataset. The model was trained on the VeraCruz Dataset, a collection of text samples from both languages. The model was trained on a total of 500,000 examples, a evenly split between European Portuguese and Brazilian Portuguese, ensuring a balanced representation of both language variants.

It achieves the following results on an evaluation set of 50,000 examples:

  • Loss: 0.1791
  • Accuracy: 0.9461

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 256
  • eval_batch_size: 256
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • steps: 2500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.4772 0.06 500 0.2501 0.9080
0.3412 0.13 1000 0.2275 0.9135
0.3122 0.19 1500 0.2578 0.9014
0.2975 0.25 2000 0.1992 0.9396
0.2877 0.31 2500 0.1791 0.9461

Framework versions

  • Transformers 4.40.0.dev0
  • Pytorch 2.2.1
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
45
Safetensors
Model size
4.39M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train bastao/PeroVaz_PT-BR_Classifier