yuli-aisg commited on
Commit
ab3664c
1 Parent(s): 0c5276c

updated infrastructure and configuration

Browse files
Files changed (1) hide show
  1. README.md +5 -23
README.md CHANGED
@@ -53,24 +53,6 @@ For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please
53
 
54
  Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
55
 
56
- | Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
57
- |---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
58
- | FineWebEdu | 7.650 | 1 | 7.650 | 15.90 |
59
- | Stackv2 | 1.160 | 1 | 1.16 | 9.21 |
60
- | Dolma Reddit - English | 1.339 | 1 | 1.339 | 2.42 |
61
- | Dolma Semantic Scholar | 0.959 | 1 | 0.959 | 2.79 |
62
- | Dolma arXiv | 0.469 | 1 | 0.469 | 1.99 |
63
- | Dolma StarCoder | 4.422 | 1 | 4.422 | 0.98 |
64
- | SEA-LION Pile - Indonesian| 3.4 | 2 | 6.8 | 14.17 |
65
- | Wiki* - Indonesian | 0.3 | 4 | 1.2 | 2.50 |
66
- | SEA-LION Pile - Tamil | 5.6 | 1 | 5.6 | 11.67 |
67
- | Wiki* + News - Tamil | 0.6 | 4 | 2.4 | 5.00 |
68
- | SEA-LION Pile - Thai | 2.28 | 1 | 2.28 | 4.75 |
69
- | WangChanBERTa - Thai | 5 | 1 | 5 | 10.42 |
70
- | Wiki* - Thai | 0.18 | 4 | 0.72 | 1.50 |
71
- | SEA-LION Pile - Vietnamese| 6.76 | 1 | 6.76 | 14.08 |
72
- | Wiki* - Vietnamese | 0.31 | 4 | 1.24 | 2.58 |
73
-
74
  | Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%)|
75
  |---------------------------------------|:-----------------:|:----------:|:----------------:|:-------------:|
76
  | StackV2 | 40.0 | 1 | 40.0 | 20.00 |
@@ -122,21 +104,21 @@ on the following hardware:
122
 
123
  | Training Details | Gemma2 9B CPT SEA-LIONv3 |
124
  |----------------------|:--------------------:|
125
- | AWS EC2 p5d.24xlarge | 8 instances |
126
- | Nvidia H100 80GB GPU | 64 |
127
- | Training Duration | 2 days |
128
 
129
 
130
  ### Configuration
131
 
132
- | HyperParameter | Gemma2 9B CPT SEA-LIONv32 |
133
  |-------------------|:--------------------:|
134
  | Precision | bfloat16 |
135
  | Optimizer | decoupled_adamw |
136
  | Scheduler | weight_stable_decay |
137
  | Learning Rate | 1.0e-5 |
138
  | Global Batch Size | 512 |
139
- | Micro Batch Size | 2 |
140
 
141
 
142
  ## The Team
 
53
 
54
  Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  | Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%)|
57
  |---------------------------------------|:-----------------:|:----------:|:----------------:|:-------------:|
58
  | StackV2 | 40.0 | 1 | 40.0 | 20.00 |
 
104
 
105
  | Training Details | Gemma2 9B CPT SEA-LIONv3 |
106
  |----------------------|:--------------------:|
107
+ | SingTel HGX-100 | 8+1 instances |
108
+ | Nvidia H100 80GB GPU | 64+8 |
109
+ | Training Duration | 10 days |
110
 
111
 
112
  ### Configuration
113
 
114
+ | HyperParameter | Gemma2 9B CPT SEA-LIONv3 |
115
  |-------------------|:--------------------:|
116
  | Precision | bfloat16 |
117
  | Optimizer | decoupled_adamw |
118
  | Scheduler | weight_stable_decay |
119
  | Learning Rate | 1.0e-5 |
120
  | Global Batch Size | 512 |
121
+ | Micro Batch Size | 1 |
122
 
123
 
124
  ## The Team