Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ tags:
|
|
25 |
|
26 |
# NusaBERT Large
|
27 |
|
28 |
-
NusaBERT Large is a multilingual encoder-based language model based on the [BERT](https://arxiv.org/abs/1810.04805) architecture. We conducted continued pre-training on open-source corpora of [sabilmakbar/indo_wiki](https://huggingface.co/datasets/sabilmakbar/indo_wiki), [acul3/KoPI-NLLB](https://huggingface.co/datasets/acul3/KoPI-NLLB), and [uonlp/CulturaX](https://huggingface.co/datasets/uonlp/CulturaX). On a held-out subset of the corpus, our model achieved:
|
29 |
|
30 |
- `eval_accuracy`: 0.7117
|
31 |
- `eval_loss`: 1.3268
|
@@ -101,4 +101,17 @@ NusaBERT Large is developed with love by:
|
|
101 |
<a href="https://github.com/w11wo">
|
102 |
<img src="https://github.com/w11wo.png" alt="GitHub Profile" style="border-radius: 50%;width: 64px;margin:0 4px;">
|
103 |
</a>
|
104 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
# NusaBERT Large
|
27 |
|
28 |
+
[NusaBERT](https://arxiv.org/abs/2403.01817) Large is a multilingual encoder-based language model based on the [BERT](https://arxiv.org/abs/1810.04805) architecture. We conducted continued pre-training on open-source corpora of [sabilmakbar/indo_wiki](https://huggingface.co/datasets/sabilmakbar/indo_wiki), [acul3/KoPI-NLLB](https://huggingface.co/datasets/acul3/KoPI-NLLB), and [uonlp/CulturaX](https://huggingface.co/datasets/uonlp/CulturaX). On a held-out subset of the corpus, our model achieved:
|
29 |
|
30 |
- `eval_accuracy`: 0.7117
|
31 |
- `eval_loss`: 1.3268
|
|
|
101 |
<a href="https://github.com/w11wo">
|
102 |
<img src="https://github.com/w11wo.png" alt="GitHub Profile" style="border-radius: 50%;width: 64px;margin:0 4px;">
|
103 |
</a>
|
104 |
+
</div>
|
105 |
+
|
106 |
+
## Citation
|
107 |
+
|
108 |
+
```bib
|
109 |
+
@misc{wongso2024nusabert,
|
110 |
+
title={NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural},
|
111 |
+
author={Wilson Wongso and David Samuel Setiawan and Steven Limcorn and Ananto Joyoadikusumo},
|
112 |
+
year={2024},
|
113 |
+
eprint={2403.01817},
|
114 |
+
archivePrefix={arXiv},
|
115 |
+
primaryClass={cs.CL}
|
116 |
+
}
|
117 |
+
```
|