Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ datasets:
|
|
16 |
- fzmnm/TinyStoriesAdv-zh
|
17 |
---
|
18 |
|
19 |
-
###
|
20 |
|
21 |
![alt text](README.files/79e6f31072d75ef82135302dd88859a.png)
|
22 |
|
@@ -27,7 +27,7 @@ keywords: grade school level, large language model, small language model, tiny l
|
|
27 |
|
28 |
采用了类似Qwen的架构。
|
29 |
```python
|
30 |
-
dim=
|
31 |
tokens_per_iteration=524288
|
32 |
dropout=0.1
|
33 |
warmup_iters=1000;stable_iters=9000
|
|
|
16 |
- fzmnm/TinyStoriesAdv-zh
|
17 |
---
|
18 |
|
19 |
+
### TinyStoriesAdv_215M
|
20 |
|
21 |
![alt text](README.files/79e6f31072d75ef82135302dd88859a.png)
|
22 |
|
|
|
27 |
|
28 |
采用了类似Qwen的架构。
|
29 |
```python
|
30 |
+
dim=896;n_layers=24;n_heads=14;n_kv_heads=2;max_seq_len=1024;embedding_weight_tying=True;
|
31 |
tokens_per_iteration=524288
|
32 |
dropout=0.1
|
33 |
warmup_iters=1000;stable_iters=9000
|