indiejoseph commited on
Commit
a473699
1 Parent(s): 998e451

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -2,9 +2,15 @@
2
  base_model: /notebooks/cantonese/bert-base-cantonese
3
  tags:
4
  - generated_from_trainer
 
 
5
  model-index:
6
  - name: bert-base-cantonese
7
  results: []
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -12,11 +18,11 @@ should probably proofread and complete it, then remove this comment. -->
12
 
13
  # bert-base-cantonese
14
 
15
- This model is a fine-tuned version of [/notebooks/cantonese/bert-base-cantonese](https://huggingface.co//notebooks/cantonese/bert-base-cantonese) on an unknown dataset.
16
 
17
  ## Model description
18
 
19
- More information needed
20
 
21
  ## Intended uses & limitations
22
 
@@ -39,7 +45,7 @@ The following hyperparameters were used during training:
39
  - total_train_batch_size: 192
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: linear
42
- - num_epochs: 3.0
43
 
44
  ### Training results
45
 
@@ -50,4 +56,4 @@ The following hyperparameters were used during training:
50
  - Transformers 4.34.0.dev0
51
  - Pytorch 2.0.1+cu118
52
  - Datasets 2.14.5
53
- - Tokenizers 0.14.0
 
2
  base_model: /notebooks/cantonese/bert-base-cantonese
3
  tags:
4
  - generated_from_trainer
5
+ - Cantonese
6
+ - bert
7
  model-index:
8
  - name: bert-base-cantonese
9
  results: []
10
+ license: cc-by-4.0
11
+ language:
12
+ - yue
13
+ pipeline_tag: fill-mask
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
18
 
19
  # bert-base-cantonese
20
 
21
+ This model is a continue pre-train version of [indiejoseph/cantonese/bert-base-cantonese](https://huggingface.co//notebooks/cantonese/bert-base-cantonese) on [indiejoseph/wikipedia-zh-yue-filtered](https://huggingface.co/datasets/indiejoseph/wikipedia-zh-yue-filtered).
22
 
23
  ## Model description
24
 
25
+ This model has extended 500 more Chinese characters which very common in Cantonese, such as `冧`, `噉`, `麪`, `笪`, `冚`, `乸` etc, and continue pre-trained with [indiejoseph/wikipedia-zh-yue-filtered](https://huggingface.co/datasets/indiejoseph/wikipedia-zh-yue-filtered)
26
 
27
  ## Intended uses & limitations
28
 
 
45
  - total_train_batch_size: 192
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: linear
48
+ - num_epochs: 10
49
 
50
  ### Training results
51
 
 
56
  - Transformers 4.34.0.dev0
57
  - Pytorch 2.0.1+cu118
58
  - Datasets 2.14.5
59
+ - Tokenizers 0.14.0