w11wo commited on
Commit
af43d88
1 Parent(s): 0b8284a

Updated README according to new model's results

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -12,7 +12,7 @@ widget:
12
  ## Javanese BERT Small
13
  Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
14
 
15
- The model was originally HuggingFace's pretrained [English BERT model](https://huggingface.co/bert-base-uncased) and is later fine-tuned on the Javanese dataset. It achieved a perplexity of 49.43 on the validation dataset (20% of the articles). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
16
 
17
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base BERT model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
18
 
@@ -22,11 +22,11 @@ Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was
22
  | `javanese-bert-small` | 110M | BERT Small | Javanese Wikipedia (319 MB of text) |
23
 
24
  ## Evaluation Results
25
- The model was trained for 15 epochs and the following is the final result once the training ended.
26
 
27
  | train loss | valid loss | perplexity | total time |
28
  |------------|------------|------------|------------|
29
- | 3.918 | 3.900 | 49.43 | 5:19:36 |
30
 
31
  ## How to Use
32
  ### As Masked Language Model
 
12
  ## Javanese BERT Small
13
  Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
14
 
15
+ The model was originally HuggingFace's pretrained [English BERT model](https://huggingface.co/bert-base-uncased) and is later fine-tuned on the Javanese dataset. It achieved a perplexity of 22.00 on the validation dataset (20% of the articles). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
16
 
17
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base BERT model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
18
 
 
22
  | `javanese-bert-small` | 110M | BERT Small | Javanese Wikipedia (319 MB of text) |
23
 
24
  ## Evaluation Results
25
+ The model was trained for 5 epochs and the following is the final result once the training ended.
26
 
27
  | train loss | valid loss | perplexity | total time |
28
  |------------|------------|------------|------------|
29
+ | 3.116 | 3.091 | 22.00 | 2:7:42 |
30
 
31
  ## How to Use
32
  ### As Masked Language Model