Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,11 @@ tags:
|
|
16 |
- The original idea was to use this 1.8B model, divide the dataset based on task specific capabilities, train models and transform them into a mixture of experts.
|
17 |
- Hyperparameters: adamw with eps of 1e-8, cosine decay w/ 20% warmup, lr=2e-5.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
19 |
## Benchamrks:
|
20 |
|
21 |
WIP
|
|
|
16 |
- The original idea was to use this 1.8B model, divide the dataset based on task specific capabilities, train models and transform them into a mixture of experts.
|
17 |
- Hyperparameters: adamw with eps of 1e-8, cosine decay w/ 20% warmup, lr=2e-5.
|
18 |
|
19 |
+
## Format:
|
20 |
+
```
|
21 |
+
<|system|></s><|prompt|></s><|answer|>
|
22 |
+
```
|
23 |
+
|
24 |
## Benchamrks:
|
25 |
|
26 |
WIP
|