File size: 3,006 Bytes
8a412ef
 
 
 
 
 
 
 
 
 
88f6196
 
d7e8a56
 
 
 
8a412ef
88f6196
8a412ef
 
 
88f6196
8a412ef
88f6196
8a412ef
410bae9
 
 
 
 
8a412ef
88f6196
8a412ef
f803b16
 
 
8a412ef
 
 
 
 
 
88f6196
 
 
 
 
 
 
8a412ef
 
 
88f6196
 
 
 
 
 
 
8a412ef
 
 
 
 
 
 
88f6196
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
base_model: indobenchmark/indobert-lite-base-p1
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
- precision
- recall
language:
- ind
datasets:
- indonli
widget:
- text: Andi tersenyum karena mendapat hasil baik. </s></s> Andi sedih.
model-index:
- name: indobert-lite-base-p1-indonli-distil-mdeberta
  results: []
---

# IndoBERT Lite Base IndoNLI Distil mDeBERTa

IndoBERT Lite Base IndoNLI Distil mDeBERTa is a natural language inference (NLI) model based on the [ALBERT](https://arxiv.org/abs/1909.11942) model. The model was originally the pre-trained [indobenchmark/indobert-lite-base-p1](https://huggingface.co/indobenchmark/indobert-lite-base-p1) model, which is then fine-tuned on [`IndoNLI`](https://github.com/ir-nlp-csui/indonli)'s dataset consisting of Indonesian Wikipedia, news, and Web articles [1], whilst being distilled from [MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7).

## Evaluation Results

|           | `dev` Acc. | `test_lay` Acc. | `test_expert` Acc. |
| --------- | :--------: | :-------------: | :----------------: |
| `IndoNLI` |   77.19    |      74.42      |       61.22        |

## Model

| Model                                           | #params | Arch.       | Training/Validation data (text) |
| ----------------------------------------------- | ------- | ----------- | ------------------------------- |
| `indobert-lite-base-p1-indonli-distil-mdeberta` | 11.7M   | ALBERT Base | `IndoNLI`                       |

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- `learning_rate`: `2e-05`
- `train_batch_size`: `16`
- `eval_batch_size`: `16`
- `seed`: `42`
- `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
- `lr_scheduler_type`: linear
- `num_epochs`: `5`

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Accuracy |   F1   | Precision | Recall |
| :-----------: | :---: | :---: | :-------------: | :------: | :----: | :-------: | :----: |
|    0.5053     |  1.0  |  646  |     0.4511      |  0.7506  | 0.7462 |  0.7530   | 0.7445 |
|    0.4516     |  2.0  | 1292  |     0.4458      |  0.7692  | 0.7683 |  0.7684   | 0.7697 |
|    0.4192     |  3.0  | 1938  |     0.4433      |  0.7701  | 0.7677 |  0.7685   | 0.7673 |
|    0.3647     |  4.0  | 2584  |     0.4497      |  0.7720  | 0.7699 |  0.7697   | 0.7701 |
|    0.3502     |  5.0  | 3230  |     0.4530      |  0.7679  | 0.7661 |  0.7658   | 0.7668 |

### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0

## References

[1] Mahendra, R., Aji, A. F., Louvan, S., Rahman, F., & Vania, C. (2021, November). [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://arxiv.org/abs/2110.14566). _Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing_. Association for Computational Linguistics.