updated infrastructure and configuration
Browse files
README.md
CHANGED
@@ -53,24 +53,6 @@ For more details on Gemma2 9B CPT SEA-LIONv3 base benchmark performance, please
|
|
53 |
|
54 |
Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
|
55 |
|
56 |
-
| Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
|
57 |
-
|---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
|
58 |
-
| FineWebEdu | 7.650 | 1 | 7.650 | 15.90 |
|
59 |
-
| Stackv2 | 1.160 | 1 | 1.16 | 9.21 |
|
60 |
-
| Dolma Reddit - English | 1.339 | 1 | 1.339 | 2.42 |
|
61 |
-
| Dolma Semantic Scholar | 0.959 | 1 | 0.959 | 2.79 |
|
62 |
-
| Dolma arXiv | 0.469 | 1 | 0.469 | 1.99 |
|
63 |
-
| Dolma StarCoder | 4.422 | 1 | 4.422 | 0.98 |
|
64 |
-
| SEA-LION Pile - Indonesian| 3.4 | 2 | 6.8 | 14.17 |
|
65 |
-
| Wiki* - Indonesian | 0.3 | 4 | 1.2 | 2.50 |
|
66 |
-
| SEA-LION Pile - Tamil | 5.6 | 1 | 5.6 | 11.67 |
|
67 |
-
| Wiki* + News - Tamil | 0.6 | 4 | 2.4 | 5.00 |
|
68 |
-
| SEA-LION Pile - Thai | 2.28 | 1 | 2.28 | 4.75 |
|
69 |
-
| WangChanBERTa - Thai | 5 | 1 | 5 | 10.42 |
|
70 |
-
| Wiki* - Thai | 0.18 | 4 | 0.72 | 1.50 |
|
71 |
-
| SEA-LION Pile - Vietnamese| 6.76 | 1 | 6.76 | 14.08 |
|
72 |
-
| Wiki* - Vietnamese | 0.31 | 4 | 1.24 | 2.58 |
|
73 |
-
|
74 |
| Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%)|
|
75 |
|---------------------------------------|:-----------------:|:----------:|:----------------:|:-------------:|
|
76 |
| StackV2 | 40.0 | 1 | 40.0 | 20.00 |
|
@@ -122,21 +104,21 @@ on the following hardware:
|
|
122 |
|
123 |
| Training Details | Gemma2 9B CPT SEA-LIONv3 |
|
124 |
|----------------------|:--------------------:|
|
125 |
-
|
|
126 |
-
| Nvidia H100 80GB GPU |
|
127 |
-
| Training Duration |
|
128 |
|
129 |
|
130 |
### Configuration
|
131 |
|
132 |
-
| HyperParameter | Gemma2 9B CPT SEA-
|
133 |
|-------------------|:--------------------:|
|
134 |
| Precision | bfloat16 |
|
135 |
| Optimizer | decoupled_adamw |
|
136 |
| Scheduler | weight_stable_decay |
|
137 |
| Learning Rate | 1.0e-5 |
|
138 |
| Global Batch Size | 512 |
|
139 |
-
| Micro Batch Size |
|
140 |
|
141 |
|
142 |
## The Team
|
|
|
53 |
|
54 |
Gemma2 9B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
| Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%)|
|
57 |
|---------------------------------------|:-----------------:|:----------:|:----------------:|:-------------:|
|
58 |
| StackV2 | 40.0 | 1 | 40.0 | 20.00 |
|
|
|
104 |
|
105 |
| Training Details | Gemma2 9B CPT SEA-LIONv3 |
|
106 |
|----------------------|:--------------------:|
|
107 |
+
| SingTel HGX-100 | 8+1 instances |
|
108 |
+
| Nvidia H100 80GB GPU | 64+8 |
|
109 |
+
| Training Duration | 10 days |
|
110 |
|
111 |
|
112 |
### Configuration
|
113 |
|
114 |
+
| HyperParameter | Gemma2 9B CPT SEA-LIONv3 |
|
115 |
|-------------------|:--------------------:|
|
116 |
| Precision | bfloat16 |
|
117 |
| Optimizer | decoupled_adamw |
|
118 |
| Scheduler | weight_stable_decay |
|
119 |
| Learning Rate | 1.0e-5 |
|
120 |
| Global Batch Size | 512 |
|
121 |
+
| Micro Batch Size | 1 |
|
122 |
|
123 |
|
124 |
## The Team
|