SkyworkAIGC
/

SkyTextTiny

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SkyWork commited on Dec 16, 2022

Commit

326f068

•

1 Parent(s): ad7d40d

Update README.md

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
----
-license: mit
----

+# SkyTextJunior
+SkyTextJunior是由奇点智源发布的中文GPT3预训练模型，参数量30亿左右，可以进行聊天、问答、中英互译等不同的[任务](https://openapi.singularity-ai.com/index.html#/examplesIndex)。
+## 项目亮点
+1. 技术优势一 ：30多道流程的数据清洗
+   随着NLP技术的发展，预训练大模型逐渐成为了人工智能的核心技术之一。预训练大模型通常需要海量的文本来进行训练，网络文本自然成为了最重要的语料来源。而训练语料的质量无疑直接影响着模型的效果。为了训练出能力出众的模型，奇点智源在数据清洗时使用了30多道的清洗流程。精益求精的细节处理，铸造了卓越的模型效果。
+2. 技术优势二：针对中文优化创新的中文编码方式
+   曾经在预训练大模型领域，一直是被英文社区主导着，而中文预训练大模型的重要性不言而喻。不同于英文的拼音文字，中文预训练大模型的中文输入方式显然应该有所不同。奇点智源针对中文的特点，优化创新使用了独特的中文编码方式，更加符合中文的语言习惯，重新构建出更利于模型理解的中文字典。
+# 奇点新闻
+- [2022.12.15] [昆仑天工AIGC发布会](https://live.vhall.com/v3/lives/subscribe/697547540)
+## 依赖
+```
+推荐
+transformers>=4.16.0
+```
+## 模型使用
+```python
+# -*- coding: utf-8 -*-
+from transformers import GPT2LMHeadModel
+from transformers import AutoTokenizer
+from transformers import TextGenerationPipeline
+model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyTextJunior")
+tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyTextJunior", trust_remote_code=True)
+text_generator = TextGenerationPipeline(model, tokenizer, device=0)
+input_str = "今天是个好天气"
+max_new_tokens = 20
+print(text_generator(input_str, max_new_tokens=max_new_tokens, do_sample=True))
+```
+# 版权许可
+[MIT License]