5CD-AI
/

viso-twhin-bert-large

Inference Endpoints

Model card Files Files and versions Community

htdung167 commited on Apr 19

Commit

fc4485d

•

1 Parent(s): a02771e

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -172,14 +172,19 @@ mask_filler("đúng nhận sai <mask>", top_k=10)
 ## Fine-tune Configuration
 We fine-tune `5CD-AI/viso-twhin-bert-large` on 4 downstream tasks with `transformer` library with the following configuration:
 - seed: 42
-- gradient_accumulation_steps: 1
 - weight_decay: 0.01
 - optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
 - training_epochs: 30
 - model_max_length: 128
 - learning_rate: 1e-5
 And different additional configurations for each task:
 | Emotion Recognition                                                               | Hate Speech Detection                                                             | Spam Reviews Detection                                                            | Hate Speech Spans Detection                                                       |
 | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
-|\- train_batch_size: 64<br>\- lr_scheduler_type: linear | \- train_batch_size: 32<br>\- lr_scheduler_type: linear | \- train_batch_size: 32<br>\- lr_scheduler_type: cosine | \- train_batch_size: 32<br>\- lr_scheduler_type: cosine |

 ## Fine-tune Configuration
 We fine-tune `5CD-AI/viso-twhin-bert-large` on 4 downstream tasks with `transformer` library with the following configuration:
+- train_batch_size: 16
 - seed: 42
+- gradient_accumulation_steps: 4
 - weight_decay: 0.01
 - optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
 - training_epochs: 30
 - model_max_length: 128
 - learning_rate: 1e-5
+- metric_for_best_model: wf1
+- strategy: epoch
 And different additional configurations for each task:
 | Emotion Recognition                                                               | Hate Speech Detection                                                             | Spam Reviews Detection                                                            | Hate Speech Spans Detection                                                       |
 | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
+|\- learning_rate: 1e-5| \- learning_rate: 5e-6 | \- learning_rate: 1e-5 | \- learning_rate: 5e-6 |