jingyaogong
commited on
Upload 2 files
Browse files- README.md +115 -95
- README_en.md +19 -5
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
![logo](./images/logo.png)
|
2 |
<div align="center">
|
3 |
|
|
|
4 |
[![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
|
5 |
[![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
|
6 |
[![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
|
@@ -19,9 +20,8 @@
|
|
19 |
|
20 |
</div>
|
21 |
|
22 |
-
|
23 |
-
*
|
24 |
-
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到CPU也可快速推理甚至训练。
|
25 |
* **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
|
26 |
* 这是一个既是开源项目,又是入门LLM教程,同时也是一个初具雏形的开源模型,希望能起到抛砖引玉的作用。
|
27 |
|
@@ -39,14 +39,15 @@
|
|
39 |
因此,本项目的目标是把上手LLM的门槛无限降低,
|
40 |
直接从0开始训练一个极其轻量的语言模型。
|
41 |
|
42 |
-
(截至2024.
|
43 |
|
44 |
-
| 模型 (大小) | 速度 (Tokens/s) | 推理占用 | 训练占用(`batch_size=8`) |
|
45 |
-
|
46 |
-
| MiniMind-small-T (26M) | 91.9 | 0.5 GB | 3.6 GB |
|
47 |
-
| MiniMind-small (56M) | 85.2 | 0.7 GB | 4.5 GB |
|
48 |
-
| MiniMind (218M) | 57.6 | 2.1 GB | 10.4 GB |
|
49 |
-
| MiniMind-MoE (166M) | 64.9 | 1.6 GB | 7.4 GB |
|
|
|
50 |
|
51 |
> 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
|
52 |
|
@@ -65,6 +66,8 @@
|
|
65 |
👉**最近更新**
|
66 |
|
67 |
<details close>
|
|
|
|
|
68 |
<summary> <b>2024-08-27</b> </summary>
|
69 |
- 项目首次开源
|
70 |
</details>
|
@@ -116,30 +119,30 @@ python 2-eval.py
|
|
116 |
* 2.6 `python 4-lora_sft.py` 执行lora微调(非必须)。
|
117 |
* 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐(非必须)。
|
118 |
* 3、测试模型推理效果
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
|
130 |
-
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
|
141 |
-
|
142 |
-
|
143 |
|
144 |
🍭 【Tip】预训练和全参微调pretrain和full_sft均支持DDP多卡加速
|
145 |
|
@@ -163,8 +166,8 @@ python 2-eval.py
|
|
163 |
因为LLM体积非常小,为了避免模型头重脚轻(词嵌入embedding层参数占整个LLM比太高),所以词表长度需要选择比较小。
|
164 |
强大的开源模型例如01万物、千问、chatglm、mistral、Llama3等,它们的tokenizer词表长度如下:
|
165 |
|
166 |
-
| Tokenizer 模型 | 词表大小 | 来源 |
|
167 |
-
|
168 |
| yi tokenizer | 64,000 | 01万物(中国) |
|
169 |
| qwen2 tokenizer | 151,643 | 阿里云(中国) |
|
170 |
| glm tokenizer | 151,329 | 智谱AI(中国) |
|
@@ -176,11 +179,13 @@ python 2-eval.py
|
|
176 |
但MiniMind这里选择了mistral tokenizer作为分词器以保持整体参数轻量,避免头重脚轻,因为mistral的词表大小只有32,000。
|
177 |
且MiniMind在实际测试中几乎没有出现过生僻词汇解码失败的情况,效果良好。
|
178 |
|
179 |
-
> 方便对比测试效果,额外训练了一个自定义Tokenizer模型的版本**MiniMind
|
180 |
|
181 |
---
|
182 |
|
183 |
-
-
|
|
|
|
|
184 |
是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。
|
185 |
整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。
|
186 |
总量大约在10B token,适合中文大语言模型的预训练。
|
@@ -253,7 +258,8 @@ MiniMind的整体结构一致,只是在RoPE计算、推理函数和FFN层的
|
|
253 |
| minimind-small-T | 26M | 6400 | 8 | 512 | 8 | 16 | - | - |
|
254 |
| minimind-small | 56M | 32000 | 8 | 640 | 8 | 16 | - | - |
|
255 |
| minimind | 218M | 32000 | 16 | 1024 | 8 | 16 | - | - |
|
256 |
-
| minimind-MoE |
|
|
|
257 |
|
258 |
此外作为参考,GPT3的层数和维度参数见下表:
|
259 |
![gpt3_config.png](./images/gpt3_config.png)
|
@@ -273,6 +279,7 @@ CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
|
273 |
| minimind-small | 56M | 32000 | 24 | ≈6 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
|
274 |
| minimind | 218M | 32000 | 16 | ≈15 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
|
275 |
| minimind-MoE | 166M | 32000 | 16 | ≈13 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
|
|
|
276 |
|
277 |
---
|
278 |
|
@@ -324,6 +331,7 @@ CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
|
|
324 |
| minimind-small | 56M | d_model=640<br/>n_layers=8 | [链接](https://pan.baidu.com/s/1nJuOpnu5115FDuz6Ewbeqg?pwd=6666) | [链接](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1LzVxBpL0phtGUH267Undqw?pwd=6666) |
|
325 |
| minimind | 218M | d_model=1024<br/>n_layers=16 | [链接](https://pan.baidu.com/s/1jzA7uLEi-Jen2fW5olCmEg?pwd=6666) | [链接](https://pan.baidu.com/s/1Hvt0Q_UB_uW2sWTw6w1zRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1fau9eat3lXilnrG3XNhG5Q?pwd=6666) |
|
326 |
| minimind-MoE | 166M | d_model=1024<br/>n_layers=8<br/>share+route=2+4 | [链接](https://pan.baidu.com/s/11CneDVTkw2Y6lNilQX5bWw?pwd=6666) | [链接](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666) | [链接](https://pan.baidu.com/s/1HC2KSM_-RHRtgv7ZDkKI9Q?pwd=6666) |
|
|
|
327 |
|
328 |
---
|
329 |
|
@@ -350,6 +358,8 @@ MobileLLM提出架构的深度比宽度更重要,「深而窄」的「瘦长
|
|
350 |
|
351 |
# 📌 Eval
|
352 |
|
|
|
|
|
353 |
[A] [minimind-small-T(0.02B)](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666)<br/>
|
354 |
[B] [minimind-small(0.05B)](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666)<br/>
|
355 |
[C] [minimind-MoE(0.16B)](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666)<br/>
|
@@ -516,62 +526,62 @@ C-Eval评测代码见:`./eval_ceval.py`,
|
|
516 |
而直接判断`A`,`B`,`C`,`D`四个字母对应token预测概率,取最大的作为回答答案,与标准答案计算正确率。
|
517 |
minimind模型本身没有使用较大的数据集训练,也没有针对回答选择题的指令做微调,测评结果可以当个参考。
|
518 |
|
519 |
-
|
520 |
-
|
521 |
-
| 类别
|
522 |
-
|
523 |
-
| probability_and_statistics_val
|
524 |
-
| law_val
|
525 |
-
| middle_school_biology_val
|
526 |
-
| high_school_chemistry_val
|
527 |
-
| high_school_physics_val
|
528 |
-
| legal_professional_val
|
529 |
-
| high_school_chinese_val
|
530 |
-
| high_school_history_val
|
531 |
-
| tax_accountant_val
|
532 |
-
| modern_chinese_history_val
|
533 |
-
| middle_school_physics_val
|
534 |
-
| middle_school_history_val
|
535 |
-
| basic_medicine_val
|
536 |
-
| operating_system_val
|
537 |
-
| logic_val
|
538 |
-
| electrical_engineer_val
|
539 |
-
| civil_servant_val
|
540 |
-
| chinese_language_and_literature_val
|
541 |
-
| college_programming_val
|
542 |
-
| accountant_val
|
543 |
-
| plant_protection_val
|
544 |
-
| middle_school_chemistry_val
|
545 |
-
| metrology_engineer_val
|
546 |
-
| veterinary_medicine_val
|
547 |
-
| marxism_val
|
548 |
-
| advanced_mathematics_val
|
549 |
-
| high_school_mathematics_val
|
550 |
-
| business_administration_val
|
551 |
-
| mao_zedong_thought_val
|
552 |
-
| ideological_and_moral_cultivation_val
|
553 |
-
| college_economics_val
|
554 |
-
| professional_tour_guide_val
|
555 |
-
| environmental_impact_assessment_engineer_val | 7/31
|
556 |
-
| computer_architecture_val
|
557 |
-
| urban_and_rural_planner_val
|
558 |
-
| college_physics_val
|
559 |
-
| middle_school_mathematics_val
|
560 |
-
| high_school_politics_val
|
561 |
-
| physician_val
|
562 |
-
| college_chemistry_val
|
563 |
-
| high_school_biology_val
|
564 |
-
| high_school_geography_val
|
565 |
-
| middle_school_politics_val
|
566 |
-
| clinical_medicine_val
|
567 |
-
| computer_network_val
|
568 |
-
| sports_science_val
|
569 |
-
| art_studies_val
|
570 |
-
| teacher_qualification_val
|
571 |
-
| discrete_mathematics_val
|
572 |
-
| education_science_val
|
573 |
-
| fire_engineer_val
|
574 |
-
| middle_school_geography_val
|
575 |
|
576 |
```text
|
577 |
总题数: 1346
|
@@ -620,6 +630,7 @@ minimind模型本身没有使用较大的数据集训练,也没有针对回答
|
|
620 |
|
621 |
* [./export_model.py](./export_model.py)可以导出模型到transformers格式,推送到huggingface
|
622 |
*
|
|
|
623 |
MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
|
624 |
|
625 |
---
|
@@ -684,7 +695,16 @@ MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collectio
|
|
684 |
* [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
|
685 |
* [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
|
686 |
|
|
|
|
|
|
|
|
|
|
|
|
|
687 |
# 📌 Statement
|
688 |
|
689 |
本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。
|
690 |
|
|
|
|
|
|
|
|
1 |
![logo](./images/logo.png)
|
2 |
<div align="center">
|
3 |
|
4 |
+
![visitors](https://visitor-badge.laobi.icu/badge?page_id=jingyaogong/minimind)
|
5 |
[![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
|
6 |
[![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
|
7 |
[![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
|
|
|
20 |
|
21 |
</div>
|
22 |
|
23 |
+
* 本开源项目旨在完全从0开始,最快仅用3小时!即可训练出仅为26M大小的微型语言模型**MiniMind**。
|
24 |
+
* **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到最普通的个人GPU也可快速推理甚至训练。
|
|
|
25 |
* **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
|
26 |
* 这是一个既是开源项目,又是入门LLM教程,同时也是一个初具雏形的开源模型,希望能起到抛砖引玉的作用。
|
27 |
|
|
|
39 |
因此,本项目的目标是把上手LLM的门槛无限降低,
|
40 |
直接从0开始训练一个极其轻量的语言模型。
|
41 |
|
42 |
+
(截至2024.09.01)MiniMind包含5个型号模型,最小仅需26M(0.02B),即可具备Amazing的对话能力!
|
43 |
|
44 |
+
| 模型 (大小) | 速度 (Tokens/s) | 推理占用 | 训练占用(`batch_size=8`) | release | 主观评分(/100) |
|
45 |
+
|------------------------|---------------|--------|----------------------|--------------------|------------|
|
46 |
+
| MiniMind-small-T (26M) | 91.9 | 0.5 GB | 3.6 GB | 2024.08.28 | 55' |
|
47 |
+
| MiniMind-small (56M) | 85.2 | 0.7 GB | 4.5 GB | 2024.08.28 | 55' |
|
48 |
+
| MiniMind (218M) | 57.6 | 2.1 GB | 10.4 GB | 2024.08.28 | 75' |
|
49 |
+
| MiniMind-MoE (166M) | 64.9 | 1.6 GB | 7.4 GB | 2024.08.28 | 40' |
|
50 |
+
| MiniMind-V1 (108M) | 78.3 | 1.0 GB | 6.4 GB | 2024.09.01 (new🎉) | 80' |
|
51 |
|
52 |
> 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
|
53 |
|
|
|
66 |
👉**最近更新**
|
67 |
|
68 |
<details close>
|
69 |
+
<summary> <b>2024-09-01 (new🎉)</b> </summary>
|
70 |
+
- 更新MiniMind-V1 (108M)模型,采用minimind_tokenizer,预训练轮次3 + SFT轮次10,更充分训练,性能更强。
|
71 |
<summary> <b>2024-08-27</b> </summary>
|
72 |
- 项目首次开源
|
73 |
</details>
|
|
|
119 |
* 2.6 `python 4-lora_sft.py` 执行lora微调(非必须)。
|
120 |
* 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐(非必须)。
|
121 |
* 3、测试模型推理效果
|
122 |
+
* 从下面【训练完成的模型权重】下载权重到`./out/`目录下
|
123 |
+
```text
|
124 |
+
out
|
125 |
+
├── multi_chat
|
126 |
+
│ ├── full_sft_1024.pth
|
127 |
+
│ ├── full_sft_512.pth
|
128 |
+
│ ├── full_sft_640_moe.pth
|
129 |
+
│ └── full_sft_640.pth
|
130 |
+
├── single_chat
|
131 |
+
│ ├── full_sft_1024.pth
|
132 |
+
│ ├── full_sft_512.pth
|
133 |
+
│ ├── full_sft_640_moe.pth
|
134 |
+
│ └── full_sft_640.pth
|
135 |
+
├── full_sft_1024.pth
|
136 |
+
├── full_sft_512.pth
|
137 |
+
├── full_sft_640_moe.pth
|
138 |
+
├── full_sft_640.pth
|
139 |
+
├── pretrain_1024.pth
|
140 |
+
├── pretrain_640_moe.pth
|
141 |
+
├── pretrain_640.pth
|
142 |
+
```
|
143 |
+
* `python 0-eval_pretrain.py`测试预训练模型的接龙效果
|
144 |
+
* `python 2-eval.py`测试模型的对话效果
|
145 |
+
![2-eval](./images/2-eval.png)
|
146 |
|
147 |
🍭 【Tip】预训练和全参微调pretrain和full_sft均支持DDP多卡加速
|
148 |
|
|
|
166 |
因为LLM体积非常小,为了避免模型头重脚轻(词嵌入embedding层参数占整个LLM比太高),所以词表长度需要选择比较小。
|
167 |
强大的开源模型例如01万物、千问、chatglm、mistral、Llama3等,它们的tokenizer词表长度如下:
|
168 |
|
169 |
+
| Tokenizer 模型 | 词表大小 | 来源 |
|
170 |
+
|--------------------|---------|------------|
|
171 |
| yi tokenizer | 64,000 | 01万物(中国) |
|
172 |
| qwen2 tokenizer | 151,643 | 阿里云(中国) |
|
173 |
| glm tokenizer | 151,329 | 智谱AI(中国) |
|
|
|
179 |
但MiniMind这里选择了mistral tokenizer作为分词器以保持整体参数轻量,避免头重脚轻,因为mistral的词表大小只有32,000。
|
180 |
且MiniMind在实际测试中几乎没有出现过生僻词汇解码失败的情况,效果良好。
|
181 |
|
182 |
+
> 方便对比测试效果,额外训练了一个自定义Tokenizer模型的版本**MiniMind-small-T**,自定义词表压缩长度到6400,使得LLM总参数进一步降低到26M左右。
|
183 |
|
184 |
---
|
185 |
|
186 |
+
-
|
187 |
+
|
188 |
+
📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md)
|
189 |
是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。
|
190 |
整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。
|
191 |
总量大约在10B token,适合中文大语言模型的预训练。
|
|
|
258 |
| minimind-small-T | 26M | 6400 | 8 | 512 | 8 | 16 | - | - |
|
259 |
| minimind-small | 56M | 32000 | 8 | 640 | 8 | 16 | - | - |
|
260 |
| minimind | 218M | 32000 | 16 | 1024 | 8 | 16 | - | - |
|
261 |
+
| minimind-MoE | 162M | 32000 | 8 | 640 | 8 | 16 | 2+4 | 2 |
|
262 |
+
| minimind-V1 | 108M | 6400 | 16 | 768 | 8 | 16 | - | - |
|
263 |
|
264 |
此外作为参考,GPT3的层数和维度参数见下表:
|
265 |
![gpt3_config.png](./images/gpt3_config.png)
|
|
|
279 |
| minimind-small | 56M | 32000 | 24 | ≈6 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
|
280 |
| minimind | 218M | 32000 | 16 | ≈15 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
|
281 |
| minimind-MoE | 166M | 32000 | 16 | ≈13 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
|
282 |
+
| minimind-V1 | 108M | 6400 | 16 | ≈8 hour (1 epoch) | ≈3 hour (1 epoch) | ≈1 hour (1 epoch) |
|
283 |
|
284 |
---
|
285 |
|
|
|
331 |
| minimind-small | 56M | d_model=640<br/>n_layers=8 | [链接](https://pan.baidu.com/s/1nJuOpnu5115FDuz6Ewbeqg?pwd=6666) | [链接](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1LzVxBpL0phtGUH267Undqw?pwd=6666) |
|
332 |
| minimind | 218M | d_model=1024<br/>n_layers=16 | [链接](https://pan.baidu.com/s/1jzA7uLEi-Jen2fW5olCmEg?pwd=6666) | [链接](https://pan.baidu.com/s/1Hvt0Q_UB_uW2sWTw6w1zRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1fau9eat3lXilnrG3XNhG5Q?pwd=6666) |
|
333 |
| minimind-MoE | 166M | d_model=1024<br/>n_layers=8<br/>share+route=2+4 | [链接](https://pan.baidu.com/s/11CneDVTkw2Y6lNilQX5bWw?pwd=6666) | [链接](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666) | [链接](https://pan.baidu.com/s/1HC2KSM_-RHRtgv7ZDkKI9Q?pwd=6666) |
|
334 |
+
| minimind-V1 | 108M | d_model=768<br/>n_layers=16 | - | [链接](https://pan.baidu.com/s/1p713loS7EfwHQf3G9eYI3Q?pwd=6666) | [链接](https://pan.baidu.com/s/12iHGpAs6R0kqsOnGtgK6vQ?pwd=6666) |
|
335 |
|
336 |
---
|
337 |
|
|
|
358 |
|
359 |
# 📌 Eval
|
360 |
|
361 |
+
> 【注】以下测试于2024.8.28完成,此日期后发布的(例如MiniMind-V1)新模型,无特殊需要时将不加入测试。
|
362 |
+
|
363 |
[A] [minimind-small-T(0.02B)](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666)<br/>
|
364 |
[B] [minimind-small(0.05B)](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666)<br/>
|
365 |
[C] [minimind-MoE(0.16B)](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666)<br/>
|
|
|
526 |
而直接判断`A`,`B`,`C`,`D`四个字母对应token预测概率,取最大的作为回答答案,与标准答案计算正确率。
|
527 |
minimind模型本身没有使用较大的数据集训练,也没有针对回答选择题的指令做微调,测评结果可以当个参考。
|
528 |
|
529 |
+
> 例如minimind-small的结果细项:
|
530 |
+
|
531 |
+
| 类别 | 正确数量/总题数 | 正确�� |
|
532 |
+
|----------------------------------------------|----------|--------|
|
533 |
+
| probability_and_statistics_val | 3/18 | 16.67% |
|
534 |
+
| law_val | 5/24 | 20.83% |
|
535 |
+
| middle_school_biology_val | 4/21 | 19.05% |
|
536 |
+
| high_school_chemistry_val | 7/19 | 36.84% |
|
537 |
+
| high_school_physics_val | 5/19 | 26.32% |
|
538 |
+
| legal_professional_val | 2/23 | 8.70% |
|
539 |
+
| high_school_chinese_val | 4/19 | 21.05% |
|
540 |
+
| high_school_history_val | 6/20 | 30.00% |
|
541 |
+
| tax_accountant_val | 10/49 | 20.41% |
|
542 |
+
| modern_chinese_history_val | 4/23 | 17.39% |
|
543 |
+
| middle_school_physics_val | 4/19 | 21.05% |
|
544 |
+
| middle_school_history_val | 4/22 | 18.18% |
|
545 |
+
| basic_medicine_val | 1/19 | 5.26% |
|
546 |
+
| operating_system_val | 3/19 | 15.79% |
|
547 |
+
| logic_val | 4/22 | 18.18% |
|
548 |
+
| electrical_engineer_val | 7/37 | 18.92% |
|
549 |
+
| civil_servant_val | 11/47 | 23.40% |
|
550 |
+
| chinese_language_and_literature_val | 5/23 | 21.74% |
|
551 |
+
| college_programming_val | 10/37 | 27.03% |
|
552 |
+
| accountant_val | 9/49 | 18.37% |
|
553 |
+
| plant_protection_val | 7/22 | 31.82% |
|
554 |
+
| middle_school_chemistry_val | 4/20 | 20.00% |
|
555 |
+
| metrology_engineer_val | 3/24 | 12.50% |
|
556 |
+
| veterinary_medicine_val | 6/23 | 26.09% |
|
557 |
+
| marxism_val | 5/19 | 26.32% |
|
558 |
+
| advanced_mathematics_val | 5/19 | 26.32% |
|
559 |
+
| high_school_mathematics_val | 4/18 | 22.22% |
|
560 |
+
| business_administration_val | 8/33 | 24.24% |
|
561 |
+
| mao_zedong_thought_val | 8/24 | 33.33% |
|
562 |
+
| ideological_and_moral_cultivation_val | 5/19 | 26.32% |
|
563 |
+
| college_economics_val | 17/55 | 30.91% |
|
564 |
+
| professional_tour_guide_val | 10/29 | 34.48% |
|
565 |
+
| environmental_impact_assessment_engineer_val | 7/31 | 22.58% |
|
566 |
+
| computer_architecture_val | 6/21 | 28.57% |
|
567 |
+
| urban_and_rural_planner_val | 11/46 | 23.91% |
|
568 |
+
| college_physics_val | 5/19 | 26.32% |
|
569 |
+
| middle_school_mathematics_val | 3/19 | 15.79% |
|
570 |
+
| high_school_politics_val | 4/19 | 21.05% |
|
571 |
+
| physician_val | 13/49 | 26.53% |
|
572 |
+
| college_chemistry_val | 3/24 | 12.50% |
|
573 |
+
| high_school_biology_val | 5/19 | 26.32% |
|
574 |
+
| high_school_geography_val | 4/19 | 21.05% |
|
575 |
+
| middle_school_politics_val | 6/21 | 28.57% |
|
576 |
+
| clinical_medicine_val | 6/22 | 27.27% |
|
577 |
+
| computer_network_val | 2/19 | 10.53% |
|
578 |
+
| sports_science_val | 2/19 | 10.53% |
|
579 |
+
| art_studies_val | 14/33 | 42.42% |
|
580 |
+
| teacher_qualification_val | 12/44 | 27.27% |
|
581 |
+
| discrete_mathematics_val | 6/16 | 37.50% |
|
582 |
+
| education_science_val | 7/29 | 24.14% |
|
583 |
+
| fire_engineer_val | 9/31 | 29.03% |
|
584 |
+
| middle_school_geography_val | 1/12 | 8.33% |
|
585 |
|
586 |
```text
|
587 |
总题数: 1346
|
|
|
630 |
|
631 |
* [./export_model.py](./export_model.py)可以导出模型到transformers格式,推送到huggingface
|
632 |
*
|
633 |
+
|
634 |
MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
|
635 |
|
636 |
---
|
|
|
695 |
* [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
|
696 |
* [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
|
697 |
|
698 |
+
## ✨Top contributors
|
699 |
+
|
700 |
+
<a href="https://github.com/jingyaogong/minimind/graphs/contributors">
|
701 |
+
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
|
702 |
+
</a>
|
703 |
+
|
704 |
# 📌 Statement
|
705 |
|
706 |
本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。
|
707 |
|
708 |
+
## License
|
709 |
+
|
710 |
+
This repository is licensed under the [Apache-2.0 License](LICENSE).
|
README_en.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
![logo](./images/logo.png)
|
2 |
<div align="center">
|
3 |
|
|
|
4 |
[![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
|
5 |
[![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
|
6 |
[![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
|
@@ -45,7 +46,7 @@ exacerbates the problem of finding quality content to understand LLMs, severely
|
|
45 |
Therefore, the goal of this project is to lower the barrier to entry for working with LLMs as much as possible, by
|
46 |
training an extremely lightweight language model from scratch.
|
47 |
|
48 |
-
(As of August
|
49 |
26MB (0.02B) and still exhibiting amazing conversational capabilities!
|
50 |
|
51 |
| Model (Size) | Speed (Tokens/s) | Inference Memory | Training Memory (`batch_size=8`) |
|
@@ -73,7 +74,7 @@ We hope this open-source project helps LLM beginners get started quickly!
|
|
73 |
👉**Recent Updates**
|
74 |
|
75 |
<details close>
|
76 |
-
<summary> <b>2024-08-
|
77 |
- Project first open-sourced
|
78 |
</details>
|
79 |
|
@@ -192,7 +193,7 @@ git clone https://github.com/jingyaogong/minimind.git
|
|
192 |
sizes:
|
193 |
|
194 |
| Tokenizer Model | Vocabulary Size | Source |
|
195 |
-
|
196 |
| yi tokenizer | 64,000 | 01-AI (China) |
|
197 |
| qwen2 tokenizer | 151,643 | Alibaba Cloud (China) |
|
198 |
| glm tokenizer | 151,329 | Zhipu AI (China) |
|
@@ -206,7 +207,7 @@ git clone https://github.com/jingyaogong/minimind.git
|
|
206 |
performance in practical tests, with almost no failures in decoding rare words.
|
207 |
|
208 |
> For comparison purposes, an additional custom Tokenizer version **MiniMind(-T)** was trained, reducing the
|
209 |
-
vocabulary size to 6,400, which further decreases the total model parameters to around
|
210 |
|
211 |
---
|
212 |
|
@@ -598,7 +599,7 @@ four tokens `A`, `B`, `C`, `D`, and choose the one with the highest probability
|
|
598 |
against the standard answer. Note that minimind models were not trained on larger datasets or fine-tuned for question
|
599 |
answering, so results should be considered as reference only.
|
600 |
|
601 |
-
|
602 |
|
603 |
| category | Correct/Total | Accuracy |
|
604 |
|----------------------------------------------|---------------|----------|
|
@@ -769,6 +770,19 @@ Special thanks to the following open-source projects for their inspiration and d
|
|
769 |
* [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
|
770 |
* [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
|
771 |
|
|
|
|
|
|
|
|
|
|
|
|
|
772 |
# 📌 Statement
|
773 |
|
774 |
This project does not assume responsibility for data security, public opinion risks, or any risks and liabilities arising from model misguidance, misuse, dissemination, or improper use related to open-source models and code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
![logo](./images/logo.png)
|
2 |
<div align="center">
|
3 |
|
4 |
+
![visitors](https://visitor-badge.laobi.icu/badge?page_id=jingyaogong/minimind)
|
5 |
[![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
|
6 |
[![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
|
7 |
[![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
|
|
|
46 |
Therefore, the goal of this project is to lower the barrier to entry for working with LLMs as much as possible, by
|
47 |
training an extremely lightweight language model from scratch.
|
48 |
|
49 |
+
(As of August 28, 2024) The initial release of MiniMind includes four model variants, with the smallest being just
|
50 |
26MB (0.02B) and still exhibiting amazing conversational capabilities!
|
51 |
|
52 |
| Model (Size) | Speed (Tokens/s) | Inference Memory | Training Memory (`batch_size=8`) |
|
|
|
74 |
👉**Recent Updates**
|
75 |
|
76 |
<details close>
|
77 |
+
<summary> <b>2024-08-28</b> </summary>
|
78 |
- Project first open-sourced
|
79 |
</details>
|
80 |
|
|
|
193 |
sizes:
|
194 |
|
195 |
| Tokenizer Model | Vocabulary Size | Source |
|
196 |
+
|----------------------|------------------|-----------------------|
|
197 |
| yi tokenizer | 64,000 | 01-AI (China) |
|
198 |
| qwen2 tokenizer | 151,643 | Alibaba Cloud (China) |
|
199 |
| glm tokenizer | 151,329 | Zhipu AI (China) |
|
|
|
207 |
performance in practical tests, with almost no failures in decoding rare words.
|
208 |
|
209 |
> For comparison purposes, an additional custom Tokenizer version **MiniMind(-T)** was trained, reducing the
|
210 |
+
vocabulary size to 6,400, which further decreases the total model parameters to around 26M.
|
211 |
|
212 |
---
|
213 |
|
|
|
599 |
against the standard answer. Note that minimind models were not trained on larger datasets or fine-tuned for question
|
600 |
answering, so results should be considered as reference only.
|
601 |
|
602 |
+
>For example, detailed results for minimind-small:
|
603 |
|
604 |
| category | Correct/Total | Accuracy |
|
605 |
|----------------------------------------------|---------------|----------|
|
|
|
770 |
* [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
|
771 |
* [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
|
772 |
|
773 |
+
|
774 |
+
## ✨Top contributors
|
775 |
+
<a href="https://github.com/jingyaogong/minimind/graphs/contributors">
|
776 |
+
<img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
|
777 |
+
</a>
|
778 |
+
|
779 |
# 📌 Statement
|
780 |
|
781 |
This project does not assume responsibility for data security, public opinion risks, or any risks and liabilities arising from model misguidance, misuse, dissemination, or improper use related to open-source models and code.
|
782 |
+
|
783 |
+
|
784 |
+
|
785 |
+
|
786 |
+
## License
|
787 |
+
|
788 |
+
This repository is licensed under the [Apache-2.0 License](LICENSE).
|