Update README.md
Browse files
README.md
CHANGED
@@ -38,8 +38,22 @@ widget:
|
|
38 |
- text: In the context of computer programming, an algorithm is
|
39 |
example_title: Algorithm Definition
|
40 |
---
|
|
|
|
|
|
|
|
|
|
|
41 |
|
|
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
45 |
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|
|
|
38 |
- text: In the context of computer programming, an algorithm is
|
39 |
example_title: Algorithm Definition
|
40 |
---
|
41 |
+
# Mixsmol-4x400M-v0.1
|
42 |
+
This is the first checkpoint (Epoch 1) of Mixsmol-4x400M-v0.1
|
43 |
+
Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:
|
44 |
+
- Reasoining capabilities through high-quality synthetic textbooks data pretraining
|
45 |
+
- Crosslingual understanding through machine translation and multilingual + multiple tasks pretraining
|
46 |
|
47 |
+
After verifying our hypothesis with this run, we will schedule a second run on bigger data and compute for it to achieve its maximum capability.
|
48 |
|
49 |
+
## Data
|
50 |
+
- Synthetic Textbooks: 8M samples
|
51 |
+
- RefinedWeb: 1M samples
|
52 |
+
- RedPajama-v2: 500K samples
|
53 |
+
- MathPile: Everything
|
54 |
+
- ThePile: MiniPile Subset
|
55 |
+
- GoodWiki
|
56 |
+
- Instruction Pretraining: 250k samples
|
57 |
|
58 |
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
59 |
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|