Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

IndoBERTNusa (IndoBERT Adapted for Balinese, Buginese, and Minangkabau)

This repository contains a language adaptation and fine-tuning of the Indobenchmark IndoBERT language model for three specific languages: Balinese, Buginese, and Minangkabau. The adaptation was performed using nusa-translation dataset.

Model Details

Performance Comparison / Benchmark

Topic Classification

We tested the model after it was fine-tuned for topic classification using nusa-dialogue dataset.

Language indobert-large-p2 (F1) indobert-nusa (F1)
Balinese 82.37 84.23
Buginese 80.53 82.03
Minangkabau 84.49 86.30

Language Identification

We also tested the model after it was fine-tuned for language identification using nusaX dataset.

Model F1-score
indobert-large-p2 98.21
indober-nusa 98.45

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.33.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.13.3

Additional Information

Licensing Information

The dataset is released under the terms of CC-BY-SA 4.0. By using this model, you are also bound to the respective Terms of Use and License of the dataset. For commercial use in small businesses and startups, please contact us (business@prosa.ai) for permission to use the datasets by informing company profile and propose of usage.

Acknowledgement

This research work is funded and supported by The Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH and FAIR Forward - Artificial Intelligence for all. We thank Direktorat Jenderal Pendidikan Tinggi, Riset, dan Teknologi Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi (Ditjen DIKTI) for providing the computing resources for this project.

Contact Us

If you have any question please contact our support team at business@prosa.ai.

Downloads last month
0

Finetuned from

Datasets used to train prosa-text/indobert-nusa