Cyrile commited on
Commit
0e0abea
1 Parent(s): b147de9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -50,6 +50,19 @@ Here is the table summarizing the architecture used for training, along with the
50
  | [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) | 1 x A100 40GB | 140 | 13 |
51
  | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB | 268 | 8 |
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  Experimentations
54
  ----------------
55
  Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point:
 
50
  | [bloomz-3b-sft-chat](https://huggingface.co/cmarkea/bloomz-3b-sft-chat) | 1 x A100 40GB | 140 | 13 |
51
  | [bloomz-7b1-mt-sft-chat](https://huggingface.co/cmarkea/bloomz-7b1-mt-sft-chat) | 4 x A100 40GB | 268 | 8 |
52
 
53
+ | Hyperparameter | Value |
54
+ |:---------------------:|:----------:|
55
+ | label smoothing | 0.05 |
56
+ | optimize | AdamW |
57
+ | betas | 0.9, 0.999 |
58
+ | learning rate | 5e-6 |
59
+ | anneal strategy | cos |
60
+ | div factor | 100 |
61
+ | final div factor | 0.1 |
62
+ | batch size | 2 |
63
+ | gradient accumulation | 200 |
64
+ | max length | 2048 |
65
+
66
  Experimentations
67
  ----------------
68
  Since the model is trained only on English and French corpora, the performance of the model cannot be guaranteed in other languages. This degradation in performance in other languages is also due to the change in the model's data type from float16 to bfloat16. The conversation example below illustrates this point: