pom
commited on
Commit
•
f34fb5c
1
Parent(s):
3c8b1bc
update readme
Browse files
README.md
CHANGED
@@ -9,20 +9,20 @@ inference: false
|
|
9 |
|
10 |
## 模型介绍
|
11 |
|
12 |
-
**XVERSE-7B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model
|
13 |
|
14 |
-
- **模型结构**:XVERSE-7B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length
|
15 |
-
- **训练数据**:构建了
|
16 |
-
- **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,
|
17 |
- **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
|
18 |
|
19 |
## Model Introduction
|
20 |
|
21 |
**XVERSE-7B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
|
22 |
|
23 |
-
- **Model Structure**: XVERSE-7B uses the mainstream Decoder-only Transformer network structure, supports 8k context length,
|
24 |
-
- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of
|
25 |
-
- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,
|
26 |
- **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
|
27 |
|
28 |
## 评测结果
|
@@ -106,15 +106,15 @@ C-Eval Category Results
|
|
106 |
The XVERSE-7B model can be loaded for inference using the following code:
|
107 |
|
108 |
```python
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
```
|
119 |
|
120 |
更多有关相关细节,包括文本生成demo和环境依赖,请参考我们的[Github](https://github.com/xverse-ai/XVERSE-7B)。
|
|
|
9 |
|
10 |
## 模型介绍
|
11 |
|
12 |
+
**XVERSE-7B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model),参数规模为 70 亿,主要特点如下:
|
13 |
|
14 |
+
- **模型结构**:XVERSE-7B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length),能满足更长的多轮对话、知识问答与摘要等需求,模型应用场景更广泛。
|
15 |
+
- **训练数据**:构建了 2.6 万亿 token 的高质量、多样化的数据对模型进行充分训练,包含中、英、俄、西等 40 多种语言,通过精细化设置不同类型数据的采样比例,使得中英两种语言表现优异,也能兼顾其他语言效果。
|
16 |
+
- **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,534 的分词器,能够同时支持多语言,而无需额外扩展词表。
|
17 |
- **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
|
18 |
|
19 |
## Model Introduction
|
20 |
|
21 |
**XVERSE-7B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
|
22 |
|
23 |
+
- **Model Structure**: XVERSE-7B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
|
24 |
+
- **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 2.6 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
|
25 |
+
- **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
|
26 |
- **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
|
27 |
|
28 |
## 评测结果
|
|
|
106 |
The XVERSE-7B model can be loaded for inference using the following code:
|
107 |
|
108 |
```python
|
109 |
+
import torch
|
110 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
111 |
+
tokenizer = AutoTokenizer.from_pretrained("xverse/XVERSE-7B")
|
112 |
+
model = AutoModelForCausalLM.from_pretrained("xverse/XVERSE-7B", trust_remote_code=True, torch_dtype=torch.float16, device_map='auto')
|
113 |
+
model = model.eval()
|
114 |
+
inputs = tokenizer('北京的景点:故宫、天坛、万里长城等。\n深圳的景点:', return_tensors='pt').input_ids
|
115 |
+
inputs = inputs.cuda()
|
116 |
+
generated_ids = model.generate(inputs, max_new_tokens=64, eos_token_id=tokenizer.eos_token_id, repetition_penalty=1.1)
|
117 |
+
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True))
|
118 |
```
|
119 |
|
120 |
更多有关相关细节,包括文本生成demo和环境依赖,请参考我们的[Github](https://github.com/xverse-ai/XVERSE-7B)。
|