SkyWork commited on
Commit
40ead69
1 Parent(s): 8cea4d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -1,3 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # SkyTextTiny
2
 
3
  SkyTextTiny是由奇点智源发布的中文GPT3预训练模型,参数量30亿左右,可以进行聊天、问答、中英互译等不同的[任务](https://openapi.singularity-ai.com/index.html#/examplesIndex)。
 
1
+ # SkyText
2
+ SkyText is a Chinese GPT3 pre-trained large model released by Singularity-AI, which can perform different tasks such as chatting, Q&A, and Chinese-English translation. SkyText is an open source project of the Chinese GPT3 pre-training model.
3
+
4
+ ## Project Highlights
5
+
6
+ Technical advantage 1: data cleaning of more than 30 processes
7
+
8
+ With the development of NLP technology, pre-training large models has gradually become one of the core technologies of artificial intelligence. Pre-training large models usually requires a large amount of text for training, and network text naturally becomes the most important source of corpus. The quality of the training corpus undoubtedly directly affects the effect of the model. In order to train a model with outstanding capabilities, Singularity-AI has used more than 30 cleaning processes in data cleaning. Excellence in details, casting excellent model effect.
9
+
10
+ Technical advantage 2: optimized and innovative Chinese coding method for Chinese
11
+
12
+ In the field of pre-training large models, it has always been dominated by the English community, and the importance of Chinese pre-training large models is self-evident. Unlike English, the Chinese input method(pinyin text) of the Chinese pre-trained large model should obviously be different. According to the characteristics of Chinese, Singularity-AI has optimized and innovated a unique Chinese encoding method, which is more in line with Chinese language habits, and rebuilt a Chinese dictionary that is more conducive to model understanding.
13
+
14
+
15
+ # News of Singularity-AI
16
+ - [2022.12.15] [AIGC Press Conference of Singularity-AI](https://live.vhall.com/v3/lives/subscribe/697547540)
17
+
18
+ ## Installation
19
+
20
+ ```
21
+ Recommend:
22
+ transformers>=4.18.0
23
+ ```
24
+
25
+ ## Model Usage
26
+
27
+ ```python
28
+ # -*- coding: utf-8 -*-
29
+ from transformers import GPT2LMHeadModel
30
+ from transformers import AutoTokenizer
31
+ from transformers import TextGenerationPipeline
32
+
33
+ model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyTextTiny")
34
+ tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyTextTiny", trust_remote_code=True)
35
+ text_generator = TextGenerationPipeline(model, tokenizer, device=0)
36
+ input_str = "今天是个好天气"
37
+ max_new_tokens = 20
38
+ print(text_generator(input_str, max_new_tokens=max_new_tokens, do_sample=True))
39
+ ```
40
+
41
+ # License
42
+ [MIT License]
43
+
44
+
45
+ ——————————————————————————————
46
+
47
  # SkyTextTiny
48
 
49
  SkyTextTiny是由奇点智源发布的中文GPT3预训练模型,参数量30亿左右,可以进行聊天、问答、中英互译等不同的[任务](https://openapi.singularity-ai.com/index.html#/examplesIndex)。