zhao1iang commited on
Commit
923a60f
1 Parent(s): 017b2ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -13,7 +13,7 @@ license_link: >-
13
  <div align="center"><img src="misc/skywork_logo.jpeg" width="550"/></div>
14
 
15
  <p align="center">
16
- 👨‍💻 <a href="https://github.com/SkyworkAI/Skywork" target="_blank">Github</a> • 🤗 <a href="https://huggingface.co/Skywork" target="_blank">Hugging Face</a>• 🤖 <a href="https://modelscope.cn/organization/Skywork" target="_blank">ModelScope</a> • 💬 <a href="https://github.com/SkyworkAI/Skywork/blob/main/misc/wechat.png?raw=true" target="_blank">WeChat</a>• 📜<a href="https://arxiv.org/" target="_blank">Tech Report</a>
17
 
18
  </p>
19
 
@@ -44,9 +44,9 @@ license_link: >-
44
  **Skywork-13B-Base**: The model was trained on a high-quality cleaned dataset consisting of 3.2 trillion multilingual data (mainly Chinese and English) and code. It has demonstrated the best performance among models of similar scale in various evaluations and benchmark tests.
45
 
46
 
47
- 如果您希望了解更多的信息,如训练方案,评估方法,请参考我们的[技术报告](https://arxiv.org/skywork-tech-report),[Skymath](https://arxiv.org/abs/2310.16713)论文,[SkyworkMM](https://github.com/will-singularity/Skywork-MM/blob/main/skywork_mm.pdf)论文。
48
 
49
- If you are interested in more training and evaluation details, please refer to our [technical report](https://arxiv.org/skywork-tech-report), [Skymath]((https://arxiv.org/skywork-tech-report)) paper and [SkyworkMM](https://github.com/will-singularity/Skywork-MM/blob/main/skywork_mm.pdf) paper.
50
 
51
  ## 训练数据(Training Data)
52
  我们精心搭建了数据清洗流程对文本中的低质量数据、有害信息、敏感信息进行清洗过滤。我们的Skywork-13B-Base模型是在清洗后的3.2TB高质量中、英、代码数据上进行训练,其中英文占比52.2%,中文占比39.6%,代码占比8%,在兼顾中文和英文上的表现的同时,代码能力也能有保证。
 
13
  <div align="center"><img src="misc/skywork_logo.jpeg" width="550"/></div>
14
 
15
  <p align="center">
16
+ 👨‍💻 <a href="https://github.com/SkyworkAI/Skywork" target="_blank">Github</a> • 🤗 <a href="https://huggingface.co/Skywork" target="_blank">Hugging Face</a>• 🤖 <a href="https://modelscope.cn/organization/Skywork" target="_blank">ModelScope</a> • 💬 <a href="https://github.com/SkyworkAI/Skywork/blob/main/misc/wechat.png?raw=true" target="_blank">WeChat</a>• 📜<a href="https://github.com/SkyworkAI/Skywork/blob/main/Skywork_13b_tech_report.pdf" target="_blank">Tech Report</a>
17
 
18
  </p>
19
 
 
44
  **Skywork-13B-Base**: The model was trained on a high-quality cleaned dataset consisting of 3.2 trillion multilingual data (mainly Chinese and English) and code. It has demonstrated the best performance among models of similar scale in various evaluations and benchmark tests.
45
 
46
 
47
+ 如果您希望了解更多的信息,如训练方案,评估方法,请参考我们的[技术报告](https://github.com/SkyworkAI/Skywork/blob/main/Skywork_13b_tech_report.pdf),[Skymath](https://arxiv.org/abs/2310.16713)论文,[SkyworkMM](https://github.com/will-singularity/Skywork-MM/blob/main/skywork_mm.pdf)论文。
48
 
49
+ If you are interested in more training and evaluation details, please refer to our [technical report](https://github.com/SkyworkAI/Skywork/blob/main/Skywork_13b_tech_report.pdf), [Skymath]((https://arxiv.org/skywork-tech-report)) paper and [SkyworkMM](https://github.com/will-singularity/Skywork-MM/blob/main/skywork_mm.pdf) paper.
50
 
51
  ## 训练数据(Training Data)
52
  我们精心搭建了数据清洗流程对文本中的低质量数据、有害信息、敏感信息进行清洗过滤。我们的Skywork-13B-Base模型是在清洗后的3.2TB高质量中、英、代码数据上进行训练,其中英文占比52.2%,中文占比39.6%,代码占比8%,在兼顾中文和英文上的表现的同时,代码能力也能有保证。