indiejoseph
/

bert-base-cantonese

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

indiejoseph commited on Oct 13, 2023

Commit

a473699

•

1 Parent(s): 998e451

Update README.md

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -2,9 +2,15 @@
 base_model: /notebooks/cantonese/bert-base-cantonese
 tags:
 - generated_from_trainer
 model-index:
 - name: bert-base-cantonese
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -12,11 +18,11 @@ should probably proofread and complete it, then remove this comment. -->
 # bert-base-cantonese
-This model is a fine-tuned version of [/notebooks/cantonese/bert-base-cantonese](https://huggingface.co//notebooks/cantonese/bert-base-cantonese) on an unknown dataset.
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -39,7 +45,7 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 192
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 3.0
 ### Training results
@@ -50,4 +56,4 @@ The following hyperparameters were used during training:
 - Transformers 4.34.0.dev0
 - Pytorch 2.0.1+cu118
 - Datasets 2.14.5
-- Tokenizers 0.14.0

 base_model: /notebooks/cantonese/bert-base-cantonese
 tags:
 - generated_from_trainer
+- Cantonese
+- bert
 model-index:
 - name: bert-base-cantonese
   results: []
+license: cc-by-4.0
+language:
+- yue
+pipeline_tag: fill-mask
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # bert-base-cantonese
+This model is a continue pre-train version of [indiejoseph/cantonese/bert-base-cantonese](https://huggingface.co//notebooks/cantonese/bert-base-cantonese) on [indiejoseph/wikipedia-zh-yue-filtered](https://huggingface.co/datasets/indiejoseph/wikipedia-zh-yue-filtered).
 ## Model description
+This model has extended 500 more Chinese characters which very common in Cantonese, such as `冧`, `噉`, `麪`, `笪`, `冚`, `乸` etc, and continue pre-trained with [indiejoseph/wikipedia-zh-yue-filtered](https://huggingface.co/datasets/indiejoseph/wikipedia-zh-yue-filtered)
 ## Intended uses & limitations
 - total_train_batch_size: 192
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- num_epochs: 10
 ### Training results
 - Transformers 4.34.0.dev0
 - Pytorch 2.0.1+cu118
 - Datasets 2.14.5
+- Tokenizers 0.14.0