jingyaogong commited on
Commit
50d126d
·
verified ·
1 Parent(s): c173da2

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +115 -95
  2. README_en.md +19 -5
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ![logo](./images/logo.png)
2
  <div align="center">
3
 
 
4
  [![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
5
  [![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
6
  [![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
@@ -19,9 +20,8 @@
19
 
20
  </div>
21
 
22
-
23
- * 本开源项目旨在完全从0开始,训练出仅为26M大小的微型语言模型**MiniMind**。
24
- * **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到CPU也可快速推理甚至训练。
25
  * **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
26
  * 这是一个既是开源项目,又是入门LLM教程,同时也是一个初具雏形的开源模型,希望能起到抛砖引玉的作用。
27
 
@@ -39,14 +39,15 @@
39
  因此,本项目的目标是把上手LLM的门槛无限降低,
40
  直接从0开始训练一个极其轻量的语言模型。
41
 
42
- (截至2024.8.27)MiniMind首发包含3个型号模型,最小仅需26M(0.02B),即可具备Amazing的对话能力!
43
 
44
- | 模型 (大小) | 速度 (Tokens/s) | 推理占用 | 训练占用(`batch_size=8`) |
45
- |------------------------|---------------|--------|----------------------|
46
- | MiniMind-small-T (26M) | 91.9 | 0.5 GB | 3.6 GB |
47
- | MiniMind-small (56M) | 85.2 | 0.7 GB | 4.5 GB |
48
- | MiniMind (218M) | 57.6 | 2.1 GB | 10.4 GB |
49
- | MiniMind-MoE (166M) | 64.9 | 1.6 GB | 7.4 GB |
 
50
 
51
  > 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
52
 
@@ -65,6 +66,8 @@
65
  👉**最近更新**
66
 
67
  <details close>
 
 
68
  <summary> <b>2024-08-27</b> </summary>
69
  - 项目首次开源
70
  </details>
@@ -116,30 +119,30 @@ python 2-eval.py
116
  * 2.6 `python 4-lora_sft.py` 执行lora微调(非必须)。
117
  * 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐(非必须)。
118
  * 3、测试模型推理效果
119
- * 从下面【训练完成的模型权重】下载权重到`./out/`目录下
120
- ```text
121
- out
122
- ├── multi_chat
123
- │   ├── full_sft_1024.pth
124
- │   ├── full_sft_512.pth
125
- │   ├── full_sft_640_moe.pth
126
- │   └── full_sft_640.pth
127
- ├── single_chat
128
- │   ├── full_sft_1024.pth
129
- │   ├── full_sft_512.pth
130
- │   ├── full_sft_640_moe.pth
131
- │   └── full_sft_640.pth
132
- ├── full_sft_1024.pth
133
- ├── full_sft_512.pth
134
- ├── full_sft_640_moe.pth
135
- ├── full_sft_640.pth
136
- ├── pretrain_1024.pth
137
- ├── pretrain_640_moe.pth
138
- ├── pretrain_640.pth
139
- ```
140
- * `python 0-eval_pretrain.py`测试预训练模型的接龙效果
141
- * `python 2-eval.py`测试模型的对话效果
142
- ![2-eval](./images/2-eval.png)
143
 
144
  🍭 【Tip】预训练和全参微调pretrain和full_sft均支持DDP多卡加速
145
 
@@ -163,8 +166,8 @@ python 2-eval.py
163
  因为LLM体积非常小,为了避免模型头重脚轻(词嵌入embedding层参数占整个LLM比太高),所以词表长度需要选择比较小。
164
  强大的开源模型例如01万物、千问、chatglm、mistral、Llama3等,它们的tokenizer词表长度如下:
165
 
166
- | Tokenizer 模型 | 词表大小 | 来源 |
167
- |--------------------|---------|------------|
168
  | yi tokenizer | 64,000 | 01万物(中国) |
169
  | qwen2 tokenizer | 151,643 | 阿里云(中国) |
170
  | glm tokenizer | 151,329 | 智谱AI(中国) |
@@ -176,11 +179,13 @@ python 2-eval.py
176
  但MiniMind这里选择了mistral tokenizer作为分词器以保持整体参数轻量,避免头重脚轻,因为mistral的词表大小只有32,000。
177
  且MiniMind在实际测试中几乎没有出现过生僻词汇解码失败的情况,效果良好。
178
 
179
- > 方便对比测试效果,额外训练了一个自定义Tokenizer模型的版本**MiniMind(-T)**,自定义词表压缩长度到6400,使得LLM总参数进一步降低到40M左右。
180
 
181
  ---
182
 
183
- - 📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md)
 
 
184
  是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。
185
  整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。
186
  总量大约在10B token,适合中文大语言模型的预训练。
@@ -253,7 +258,8 @@ MiniMind的整体结构一致,只是在RoPE计算、推理函数和FFN层的
253
  | minimind-small-T | 26M | 6400 | 8 | 512 | 8 | 16 | - | - |
254
  | minimind-small | 56M | 32000 | 8 | 640 | 8 | 16 | - | - |
255
  | minimind | 218M | 32000 | 16 | 1024 | 8 | 16 | - | - |
256
- | minimind-MoE | 166M | 32000 | 8 | 640 | 8 | 16 | 2+4 | 2 |
 
257
 
258
  此外作为参考,GPT3的层数和维度参数见下表:
259
  ![gpt3_config.png](./images/gpt3_config.png)
@@ -273,6 +279,7 @@ CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
273
  | minimind-small | 56M | 32000 | 24 | ≈6 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
274
  | minimind | 218M | 32000 | 16 | ≈15 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
275
  | minimind-MoE | 166M | 32000 | 16 | ≈13 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
 
276
 
277
  ---
278
 
@@ -324,6 +331,7 @@ CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
324
  | minimind-small | 56M | d_model=640<br/>n_layers=8 | [链接](https://pan.baidu.com/s/1nJuOpnu5115FDuz6Ewbeqg?pwd=6666) | [链接](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1LzVxBpL0phtGUH267Undqw?pwd=6666) |
325
  | minimind | 218M | d_model=1024<br/>n_layers=16 | [链接](https://pan.baidu.com/s/1jzA7uLEi-Jen2fW5olCmEg?pwd=6666) | [链接](https://pan.baidu.com/s/1Hvt0Q_UB_uW2sWTw6w1zRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1fau9eat3lXilnrG3XNhG5Q?pwd=6666) |
326
  | minimind-MoE | 166M | d_model=1024<br/>n_layers=8<br/>share+route=2+4 | [链接](https://pan.baidu.com/s/11CneDVTkw2Y6lNilQX5bWw?pwd=6666) | [链接](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666) | [链接](https://pan.baidu.com/s/1HC2KSM_-RHRtgv7ZDkKI9Q?pwd=6666) |
 
327
 
328
  ---
329
 
@@ -350,6 +358,8 @@ MobileLLM提出架构的深度比宽度更重要,「深而窄」的「瘦长
350
 
351
  # 📌 Eval
352
 
 
 
353
  [A] [minimind-small-T(0.02B)](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666)<br/>
354
  [B] [minimind-small(0.05B)](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666)<br/>
355
  [C] [minimind-MoE(0.16B)](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666)<br/>
@@ -516,62 +526,62 @@ C-Eval评测代码见:`./eval_ceval.py`,
516
  而直接判断`A`,`B`,`C`,`D`四个字母对应token预测概率,取最大的作为回答答案,与标准答案计算正确率。
517
  minimind模型本身没有使用较大的数据集训练,也没有针对回答选择题的指令做微调,测评结果可以当个参考。
518
 
519
- * 例如minimind-small的结果细项:
520
-
521
- | 类别 | 正确数量/总题数 | 正确率 |
522
- |---------------------------------|----------------|------------|
523
- | probability_and_statistics_val | 3/18 | 16.67% |
524
- | law_val | 5/24 | 20.83% |
525
- | middle_school_biology_val | 4/21 | 19.05% |
526
- | high_school_chemistry_val | 7/19 | 36.84% |
527
- | high_school_physics_val | 5/19 | 26.32% |
528
- | legal_professional_val | 2/23 | 8.70% |
529
- | high_school_chinese_val | 4/19 | 21.05% |
530
- | high_school_history_val | 6/20 | 30.00% |
531
- | tax_accountant_val | 10/49 | 20.41% |
532
- | modern_chinese_history_val | 4/23 | 17.39% |
533
- | middle_school_physics_val | 4/19 | 21.05% |
534
- | middle_school_history_val | 4/22 | 18.18% |
535
- | basic_medicine_val | 1/19 | 5.26% |
536
- | operating_system_val | 3/19 | 15.79% |
537
- | logic_val | 4/22 | 18.18% |
538
- | electrical_engineer_val | 7/37 | 18.92% |
539
- | civil_servant_val | 11/47 | 23.40% |
540
- | chinese_language_and_literature_val | 5/23 | 21.74% |
541
- | college_programming_val | 10/37 | 27.03% |
542
- | accountant_val | 9/49 | 18.37% |
543
- | plant_protection_val | 7/22 | 31.82% |
544
- | middle_school_chemistry_val | 4/20 | 20.00% |
545
- | metrology_engineer_val | 3/24 | 12.50% |
546
- | veterinary_medicine_val | 6/23 | 26.09% |
547
- | marxism_val | 5/19 | 26.32% |
548
- | advanced_mathematics_val | 5/19 | 26.32% |
549
- | high_school_mathematics_val | 4/18 | 22.22% |
550
- | business_administration_val | 8/33 | 24.24% |
551
- | mao_zedong_thought_val | 8/24 | 33.33% |
552
- | ideological_and_moral_cultivation_val | 5/19 | 26.32% |
553
- | college_economics_val | 17/55 | 30.91% |
554
- | professional_tour_guide_val | 10/29 | 34.48% |
555
- | environmental_impact_assessment_engineer_val | 7/31 | 22.58% |
556
- | computer_architecture_val | 6/21 | 28.57% |
557
- | urban_and_rural_planner_val | 11/46 | 23.91% |
558
- | college_physics_val | 5/19 | 26.32% |
559
- | middle_school_mathematics_val | 3/19 | 15.79% |
560
- | high_school_politics_val | 4/19 | 21.05% |
561
- | physician_val | 13/49 | 26.53% |
562
- | college_chemistry_val | 3/24 | 12.50% |
563
- | high_school_biology_val | 5/19 | 26.32% |
564
- | high_school_geography_val | 4/19 | 21.05% |
565
- | middle_school_politics_val | 6/21 | 28.57% |
566
- | clinical_medicine_val | 6/22 | 27.27% |
567
- | computer_network_val | 2/19 | 10.53% |
568
- | sports_science_val | 2/19 | 10.53% |
569
- | art_studies_val | 14/33 | 42.42% |
570
- | teacher_qualification_val | 12/44 | 27.27% |
571
- | discrete_mathematics_val | 6/16 | 37.50% |
572
- | education_science_val | 7/29 | 24.14% |
573
- | fire_engineer_val | 9/31 | 29.03% |
574
- | middle_school_geography_val | 1/12 | 8.33% |
575
 
576
  ```text
577
  总题数: 1346
@@ -620,6 +630,7 @@ minimind模型本身没有使用较大的数据集训练,也没有针对回答
620
 
621
  * [./export_model.py](./export_model.py)可以导出模型到transformers格式,推送到huggingface
622
  *
 
623
  MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
624
 
625
  ---
@@ -684,7 +695,16 @@ MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collectio
684
  * [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
685
  * [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
686
 
 
 
 
 
 
 
687
  # 📌 Statement
688
 
689
  本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。
690
 
 
 
 
 
1
  ![logo](./images/logo.png)
2
  <div align="center">
3
 
4
+ ![visitors](https://visitor-badge.laobi.icu/badge?page_id=jingyaogong/minimind)
5
  [![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
6
  [![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
7
  [![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
 
20
 
21
  </div>
22
 
23
+ * 本开源项目旨在完全从0开始,最快仅用3小时!即可训练出仅为26M大小的微型语言模型**MiniMind**。
24
+ * **MiniMind**极其轻量,体积约是 GPT3 的 $\frac{1}{7000}$,力求做到最普通的个人GPU也可快速推理甚至训练。
 
25
  * **MiniMind**改进自DeepSeek-V2、Llama3结构,项目包含整个数据处理、pretrain、sft、dpo的全部阶段,包含混合专家(MoE)模型。
26
  * 这是一个既是开源项目,又是入门LLM教程,同时也是一个初具雏形的开源模型,希望能起到抛砖引玉的作用。
27
 
 
39
  因此,本项目的目标是把上手LLM的门槛无限降低,
40
  直接从0开始训练一个极其轻量的语言模型。
41
 
42
+ (截至2024.09.01)MiniMind包含5个型号模型,最小仅需26M(0.02B),即可具备Amazing的对话能力!
43
 
44
+ | 模型 (大小) | 速度 (Tokens/s) | 推理占用 | 训练占用(`batch_size=8`) | release | 主观评分(/100) |
45
+ |------------------------|---------------|--------|----------------------|--------------------|------------|
46
+ | MiniMind-small-T (26M) | 91.9 | 0.5 GB | 3.6 GB | 2024.08.28 | 55' |
47
+ | MiniMind-small (56M) | 85.2 | 0.7 GB | 4.5 GB | 2024.08.28 | 55' |
48
+ | MiniMind (218M) | 57.6 | 2.1 GB | 10.4 GB | 2024.08.28 | 75' |
49
+ | MiniMind-MoE (166M) | 64.9 | 1.6 GB | 7.4 GB | 2024.08.28 | 40' |
50
+ | MiniMind-V1 (108M) | 78.3 | 1.0 GB | 6.4 GB | 2024.09.01 (new🎉) | 80' |
51
 
52
  > 该分析在一个带有Torch 2.1.2、CUDA 12.2和Flash Attention 2的RTX 3090 GPU上运行。
53
 
 
66
  👉**最近更新**
67
 
68
  <details close>
69
+ <summary> <b>2024-09-01 (new🎉)</b> </summary>
70
+ - 更新MiniMind-V1 (108M)模型,采用minimind_tokenizer,预训练轮次3 + SFT轮次10,更充分训练,性能更强。
71
  <summary> <b>2024-08-27</b> </summary>
72
  - 项目首次开源
73
  </details>
 
119
  * 2.6 `python 4-lora_sft.py` 执行lora微调(非必须)。
120
  * 2.7 `python 5-dpo_train.py` 执行DPO人类偏好强化学习对齐(非必须)。
121
  * 3、测试模型推理效果
122
+ * 从下面【训练完成的模型权重】下载权重到`./out/`目录下
123
+ ```text
124
+ out
125
+ ├── multi_chat
126
+ │   ├── full_sft_1024.pth
127
+ │   ├── full_sft_512.pth
128
+ │   ├── full_sft_640_moe.pth
129
+ │   └── full_sft_640.pth
130
+ ├── single_chat
131
+ │   ├── full_sft_1024.pth
132
+ │   ├── full_sft_512.pth
133
+ │   ├── full_sft_640_moe.pth
134
+ │   └── full_sft_640.pth
135
+ ├── full_sft_1024.pth
136
+ ├── full_sft_512.pth
137
+ ├── full_sft_640_moe.pth
138
+ ├── full_sft_640.pth
139
+ ├── pretrain_1024.pth
140
+ ├── pretrain_640_moe.pth
141
+ ├── pretrain_640.pth
142
+ ```
143
+ * `python 0-eval_pretrain.py`测试预训练模型的接龙效果
144
+ * `python 2-eval.py`测试模型的对话效果
145
+ ![2-eval](./images/2-eval.png)
146
 
147
  🍭 【Tip】预训练和全参微调pretrain和full_sft均支持DDP多卡加速
148
 
 
166
  因为LLM体积非常小,为了避免模型头重脚轻(词嵌入embedding层参数占整个LLM比太高),所以词表长度需要选择比较小。
167
  强大的开源模型例如01万物、千问、chatglm、mistral、Llama3等,它们的tokenizer词表长度如下:
168
 
169
+ | Tokenizer 模型 | 词表大小 | 来源 |
170
+ |--------------------|---------|------------|
171
  | yi tokenizer | 64,000 | 01万物(中国) |
172
  | qwen2 tokenizer | 151,643 | 阿里云(中国) |
173
  | glm tokenizer | 151,329 | 智谱AI(中国) |
 
179
  但MiniMind这里选择了mistral tokenizer作为分词器以保持整体参数轻量,避免头重脚轻,因为mistral的词表大小只有32,000。
180
  且MiniMind在实际测试中几乎没有出现过生僻词汇解码失败的情况,效果良好。
181
 
182
+ > 方便对比测试效果,额外训练了一个自定义Tokenizer模型的版本**MiniMind-small-T**,自定义词表压缩长度到6400,使得LLM总参数进一步降低到26M左右。
183
 
184
  ---
185
 
186
+ -
187
+
188
+ 📙【Pretrain数据】:[seq-monkey通用文本数据集](https://github.com/mobvoi/seq-monkey-data/blob/main/docs/pretrain_open_corpus.md)
189
  是由多种公开来源的数据(如网页、百科、博客、开源代码、书籍等)汇总清洗而成。
190
  整理成统一的JSONL格式,并经过了严格的筛选和去重,确保数据的全面性、规模、可信性和高质量。
191
  总量大约在10B token,适合中文大语言模型的预训练。
 
258
  | minimind-small-T | 26M | 6400 | 8 | 512 | 8 | 16 | - | - |
259
  | minimind-small | 56M | 32000 | 8 | 640 | 8 | 16 | - | - |
260
  | minimind | 218M | 32000 | 16 | 1024 | 8 | 16 | - | - |
261
+ | minimind-MoE | 162M | 32000 | 8 | 640 | 8 | 16 | 2+4 | 2 |
262
+ | minimind-V1 | 108M | 6400 | 16 | 768 | 8 | 16 | - | - |
263
 
264
  此外作为参考,GPT3的层数和维度参数见下表:
265
  ![gpt3_config.png](./images/gpt3_config.png)
 
279
  | minimind-small | 56M | 32000 | 24 | ≈6 hour (1 epoch) | ≈2 hour (1 epoch) | ≈0.5 hour (1 epoch) |
280
  | minimind | 218M | 32000 | 16 | ≈15 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
281
  | minimind-MoE | 166M | 32000 | 16 | ≈13 hour (1 epoch) | ≈5 hour (1 epoch) | ≈1 hour (1 epoch) |
282
+ | minimind-V1 | 108M | 6400 | 16 | ≈8 hour (1 epoch) | ≈3 hour (1 epoch) | ≈1 hour (1 epoch) |
283
 
284
  ---
285
 
 
331
  | minimind-small | 56M | d_model=640<br/>n_layers=8 | [链接](https://pan.baidu.com/s/1nJuOpnu5115FDuz6Ewbeqg?pwd=6666) | [链接](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1LzVxBpL0phtGUH267Undqw?pwd=6666) |
332
  | minimind | 218M | d_model=1024<br/>n_layers=16 | [链接](https://pan.baidu.com/s/1jzA7uLEi-Jen2fW5olCmEg?pwd=6666) | [链接](https://pan.baidu.com/s/1Hvt0Q_UB_uW2sWTw6w1zRQ?pwd=6666) | [链接](https://pan.baidu.com/s/1fau9eat3lXilnrG3XNhG5Q?pwd=6666) |
333
  | minimind-MoE | 166M | d_model=1024<br/>n_layers=8<br/>share+route=2+4 | [链接](https://pan.baidu.com/s/11CneDVTkw2Y6lNilQX5bWw?pwd=6666) | [链接](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666) | [链接](https://pan.baidu.com/s/1HC2KSM_-RHRtgv7ZDkKI9Q?pwd=6666) |
334
+ | minimind-V1 | 108M | d_model=768<br/>n_layers=16 | - | [链接](https://pan.baidu.com/s/1p713loS7EfwHQf3G9eYI3Q?pwd=6666) | [链接](https://pan.baidu.com/s/12iHGpAs6R0kqsOnGtgK6vQ?pwd=6666) |
335
 
336
  ---
337
 
 
358
 
359
  # 📌 Eval
360
 
361
+ > 【注】以下测试于2024.8.28完成,此日期后发布的(例如MiniMind-V1)新模型,无特殊需要时将不加入测试。
362
+
363
  [A] [minimind-small-T(0.02B)](https://pan.baidu.com/s/1_COe0FQRDmeapSsvArahCA?pwd=6666)<br/>
364
  [B] [minimind-small(0.05B)](https://pan.baidu.com/s/1lRX0IcpjNFSySioeCfifRQ?pwd=6666)<br/>
365
  [C] [minimind-MoE(0.16B)](https://pan.baidu.com/s/1fRq4MHZec3z-oLK6sCzj_A?pwd=6666)<br/>
 
526
  而直接判断`A`,`B`,`C`,`D`四个字母对应token预测概率,取最大的作为回答答案,与标准答案计算正确率。
527
  minimind模型本身没有使用较大的数据集训练,也没有针对回答选择题的指令做微调,测评结果可以当个参考。
528
 
529
+ > 例如minimind-small的结果细项:
530
+
531
+ | 类别 | 正确数量/总题数 | 正确�� |
532
+ |----------------------------------------------|----------|--------|
533
+ | probability_and_statistics_val | 3/18 | 16.67% |
534
+ | law_val | 5/24 | 20.83% |
535
+ | middle_school_biology_val | 4/21 | 19.05% |
536
+ | high_school_chemistry_val | 7/19 | 36.84% |
537
+ | high_school_physics_val | 5/19 | 26.32% |
538
+ | legal_professional_val | 2/23 | 8.70% |
539
+ | high_school_chinese_val | 4/19 | 21.05% |
540
+ | high_school_history_val | 6/20 | 30.00% |
541
+ | tax_accountant_val | 10/49 | 20.41% |
542
+ | modern_chinese_history_val | 4/23 | 17.39% |
543
+ | middle_school_physics_val | 4/19 | 21.05% |
544
+ | middle_school_history_val | 4/22 | 18.18% |
545
+ | basic_medicine_val | 1/19 | 5.26% |
546
+ | operating_system_val | 3/19 | 15.79% |
547
+ | logic_val | 4/22 | 18.18% |
548
+ | electrical_engineer_val | 7/37 | 18.92% |
549
+ | civil_servant_val | 11/47 | 23.40% |
550
+ | chinese_language_and_literature_val | 5/23 | 21.74% |
551
+ | college_programming_val | 10/37 | 27.03% |
552
+ | accountant_val | 9/49 | 18.37% |
553
+ | plant_protection_val | 7/22 | 31.82% |
554
+ | middle_school_chemistry_val | 4/20 | 20.00% |
555
+ | metrology_engineer_val | 3/24 | 12.50% |
556
+ | veterinary_medicine_val | 6/23 | 26.09% |
557
+ | marxism_val | 5/19 | 26.32% |
558
+ | advanced_mathematics_val | 5/19 | 26.32% |
559
+ | high_school_mathematics_val | 4/18 | 22.22% |
560
+ | business_administration_val | 8/33 | 24.24% |
561
+ | mao_zedong_thought_val | 8/24 | 33.33% |
562
+ | ideological_and_moral_cultivation_val | 5/19 | 26.32% |
563
+ | college_economics_val | 17/55 | 30.91% |
564
+ | professional_tour_guide_val | 10/29 | 34.48% |
565
+ | environmental_impact_assessment_engineer_val | 7/31 | 22.58% |
566
+ | computer_architecture_val | 6/21 | 28.57% |
567
+ | urban_and_rural_planner_val | 11/46 | 23.91% |
568
+ | college_physics_val | 5/19 | 26.32% |
569
+ | middle_school_mathematics_val | 3/19 | 15.79% |
570
+ | high_school_politics_val | 4/19 | 21.05% |
571
+ | physician_val | 13/49 | 26.53% |
572
+ | college_chemistry_val | 3/24 | 12.50% |
573
+ | high_school_biology_val | 5/19 | 26.32% |
574
+ | high_school_geography_val | 4/19 | 21.05% |
575
+ | middle_school_politics_val | 6/21 | 28.57% |
576
+ | clinical_medicine_val | 6/22 | 27.27% |
577
+ | computer_network_val | 2/19 | 10.53% |
578
+ | sports_science_val | 2/19 | 10.53% |
579
+ | art_studies_val | 14/33 | 42.42% |
580
+ | teacher_qualification_val | 12/44 | 27.27% |
581
+ | discrete_mathematics_val | 6/16 | 37.50% |
582
+ | education_science_val | 7/29 | 24.14% |
583
+ | fire_engineer_val | 9/31 | 29.03% |
584
+ | middle_school_geography_val | 1/12 | 8.33% |
585
 
586
  ```text
587
  总题数: 1346
 
630
 
631
  * [./export_model.py](./export_model.py)可以导出模型到transformers格式,推送到huggingface
632
  *
633
+
634
  MiniMind的huggingface集合地址:[MiniMind](https://huggingface.co/collections/jingyaogong/minimind-66caf8d999f5c7fa64f399e5)
635
 
636
  ---
 
695
  * [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
696
  * [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
697
 
698
+ ## ✨Top contributors
699
+
700
+ <a href="https://github.com/jingyaogong/minimind/graphs/contributors">
701
+ <img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
702
+ </a>
703
+
704
  # 📌 Statement
705
 
706
  本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。
707
 
708
+ ## License
709
+
710
+ This repository is licensed under the [Apache-2.0 License](LICENSE).
README_en.md CHANGED
@@ -1,6 +1,7 @@
1
  ![logo](./images/logo.png)
2
  <div align="center">
3
 
 
4
  [![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
5
  [![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
6
  [![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
@@ -45,7 +46,7 @@ exacerbates the problem of finding quality content to understand LLMs, severely
45
  Therefore, the goal of this project is to lower the barrier to entry for working with LLMs as much as possible, by
46
  training an extremely lightweight language model from scratch.
47
 
48
- (As of August 27, 2024) The initial release of MiniMind includes three model variants, with the smallest being just
49
  26MB (0.02B) and still exhibiting amazing conversational capabilities!
50
 
51
  | Model (Size) | Speed (Tokens/s) | Inference Memory | Training Memory (`batch_size=8`) |
@@ -73,7 +74,7 @@ We hope this open-source project helps LLM beginners get started quickly!
73
  👉**Recent Updates**
74
 
75
  <details close>
76
- <summary> <b>2024-08-27</b> </summary>
77
  - Project first open-sourced
78
  </details>
79
 
@@ -192,7 +193,7 @@ git clone https://github.com/jingyaogong/minimind.git
192
  sizes:
193
 
194
  | Tokenizer Model | Vocabulary Size | Source |
195
- |----------------------|------------------|-----------------------|
196
  | yi tokenizer | 64,000 | 01-AI (China) |
197
  | qwen2 tokenizer | 151,643 | Alibaba Cloud (China) |
198
  | glm tokenizer | 151,329 | Zhipu AI (China) |
@@ -206,7 +207,7 @@ git clone https://github.com/jingyaogong/minimind.git
206
  performance in practical tests, with almost no failures in decoding rare words.
207
 
208
  > For comparison purposes, an additional custom Tokenizer version **MiniMind(-T)** was trained, reducing the
209
- vocabulary size to 6,400, which further decreases the total model parameters to around 40M.
210
 
211
  ---
212
 
@@ -598,7 +599,7 @@ four tokens `A`, `B`, `C`, `D`, and choose the one with the highest probability
598
  against the standard answer. Note that minimind models were not trained on larger datasets or fine-tuned for question
599
  answering, so results should be considered as reference only.
600
 
601
- * For example, detailed results for minimind-small:
602
 
603
  | category | Correct/Total | Accuracy |
604
  |----------------------------------------------|---------------|----------|
@@ -769,6 +770,19 @@ Special thanks to the following open-source projects for their inspiration and d
769
  * [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
770
  * [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
771
 
 
 
 
 
 
 
772
  # 📌 Statement
773
 
774
  This project does not assume responsibility for data security, public opinion risks, or any risks and liabilities arising from model misguidance, misuse, dissemination, or improper use related to open-source models and code.
 
 
 
 
 
 
 
 
1
  ![logo](./images/logo.png)
2
  <div align="center">
3
 
4
+ ![visitors](https://visitor-badge.laobi.icu/badge?page_id=jingyaogong/minimind)
5
  [![GitHub Repo stars](https://img.shields.io/github/stars/jingyaogong/minimind?style=social)](https://github.com/jingyaogong/minimind/stargazers)
6
  [![GitHub Code License](https://img.shields.io/github/license/jingyaogong/minimind)](LICENSE)
7
  [![GitHub last commit](https://img.shields.io/github/last-commit/jingyaogong/minimind)](https://github.com/jingyaogong/minimind/commits/master)
 
46
  Therefore, the goal of this project is to lower the barrier to entry for working with LLMs as much as possible, by
47
  training an extremely lightweight language model from scratch.
48
 
49
+ (As of August 28, 2024) The initial release of MiniMind includes four model variants, with the smallest being just
50
  26MB (0.02B) and still exhibiting amazing conversational capabilities!
51
 
52
  | Model (Size) | Speed (Tokens/s) | Inference Memory | Training Memory (`batch_size=8`) |
 
74
  👉**Recent Updates**
75
 
76
  <details close>
77
+ <summary> <b>2024-08-28</b> </summary>
78
  - Project first open-sourced
79
  </details>
80
 
 
193
  sizes:
194
 
195
  | Tokenizer Model | Vocabulary Size | Source |
196
+ |----------------------|------------------|-----------------------|
197
  | yi tokenizer | 64,000 | 01-AI (China) |
198
  | qwen2 tokenizer | 151,643 | Alibaba Cloud (China) |
199
  | glm tokenizer | 151,329 | Zhipu AI (China) |
 
207
  performance in practical tests, with almost no failures in decoding rare words.
208
 
209
  > For comparison purposes, an additional custom Tokenizer version **MiniMind(-T)** was trained, reducing the
210
+ vocabulary size to 6,400, which further decreases the total model parameters to around 26M.
211
 
212
  ---
213
 
 
599
  against the standard answer. Note that minimind models were not trained on larger datasets or fine-tuned for question
600
  answering, so results should be considered as reference only.
601
 
602
+ >For example, detailed results for minimind-small:
603
 
604
  | category | Correct/Total | Accuracy |
605
  |----------------------------------------------|---------------|----------|
 
770
  * [ChatLM-mini-Chinese](https://github.com/charent/ChatLM-mini-Chinese)
771
  * [Zero-Chatgpt](https://github.com/AI-Study-Han/Zero-Chatgpt/tree/main)
772
 
773
+
774
+ ## ✨Top contributors
775
+ <a href="https://github.com/jingyaogong/minimind/graphs/contributors">
776
+ <img src="https://contrib.rocks/image?repo=jingyaogong/minimind" />
777
+ </a>
778
+
779
  # 📌 Statement
780
 
781
  This project does not assume responsibility for data security, public opinion risks, or any risks and liabilities arising from model misguidance, misuse, dissemination, or improper use related to open-source models and code.
782
+
783
+
784
+
785
+
786
+ ## License
787
+
788
+ This repository is licensed under the [Apache-2.0 License](LICENSE).