Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ pipeline_tag: text-generation
|
|
7 |
---
|
8 |
|
9 |
<h1 style='text-align: center '>BLOOM-zh</h1>
|
10 |
-
<h2 style='text-align: center '><em>
|
11 |
<h3 style='text-align: center '>Model Card</h3>
|
12 |
|
13 |
Version 1.0 / 20.Feb.2023
|
14 |
|
15 |
-
This model is a joint collaboration between CKIP lab at Acedemia Sinica ([
|
16 |
|
17 |
## Table of Contents
|
18 |
1. [Model Details](#model-details)
|
@@ -26,8 +26,8 @@ This model is a joint collaboration between CKIP lab at Acedemia Sinica ([websit
|
|
26 |
9. [Model Card Authors](#model-card-authors)
|
27 |
|
28 |
## Model Details
|
29 |
-
BLOOM-zh is a
|
30 |
-
BLOOM-zh is trained extendedly on
|
31 |
|
32 |
|
33 |
### Basics
|
@@ -50,7 +50,7 @@ BLOOM-zh is trained extendedly on larger amounts of Traditional Chinese text dat
|
|
50 |
|
51 |
**Send Questions to:** info@mtkresearch.com
|
52 |
|
53 |
-
**Cite as:** MediaTek Research
|
54 |
|
55 |
**Organizations of contributors:**
|
56 |
|
@@ -63,117 +63,33 @@ BLOOM-zh is trained extendedly on larger amounts of Traditional Chinese text dat
|
|
63 |
### Technical Specifications
|
64 |
*This section provides information for people who work on model development.*
|
65 |
|
66 |
-
|
67 |
-
<summary>Click to expand</summary><br/>
|
68 |
-
|
69 |
-
**Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
|
70 |
-
|
71 |
-
* Decoder-only architecture
|
72 |
-
|
73 |
-
* Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
|
74 |
-
|
75 |
-
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
76 |
-
|
77 |
-
* 1,065,314,304 parameters:
|
78 |
-
|
79 |
-
* 385,351,680 embedding parameters
|
80 |
-
|
81 |
-
* 24 layers, 16 attention heads
|
82 |
-
|
83 |
-
* Hidden layers are 1536-dimensional
|
84 |
-
|
85 |
-
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
86 |
-
|
87 |
-
**Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
|
88 |
-
|
89 |
-
**Compute infrastructure:**
|
90 |
-
|
91 |
-
* Hardware: 2 A6000 48GB GPUs (1 node):
|
92 |
-
|
93 |
-
|
94 |
-
* Software:
|
95 |
-
|
96 |
-
* Bigscience Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
|
97 |
-
|
98 |
-
* Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
|
99 |
-
|
100 |
-
* DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
|
101 |
-
|
102 |
-
* PyTorch (pytorch-1.12 w/ CUDA-11.3; see [Github link](https://github.com/pytorch/pytorch))
|
103 |
-
|
104 |
-
* apex ([Github link](https://github.com/NVIDIA/apex))
|
105 |
-
|
106 |
-
|
107 |
-
#### **Training**
|
108 |
-
|
109 |
-
Details are provided in the [paper](https://arxiv.org/).
|
110 |
-
|
111 |
-
- Dates: Feb. 2023
|
112 |
-
|
113 |
-
#### **Tokenization**
|
114 |
-
|
115 |
-
The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
|
116 |
-
|
117 |
-
- A byte-level Byte Pair Encoding (BPE) algorithm
|
118 |
-
|
119 |
-
- A simple pre-tokenization rule, no normalization
|
120 |
-
|
121 |
-
- A vocabulary size of 250,680
|
122 |
-
|
123 |
-
It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
|
124 |
-
|
125 |
-
</details>
|
126 |
|
127 |
|
128 |
### Environmental Impact
|
129 |
|
130 |
-
|
131 |
-
<summary>Click to expand</summary><br/>
|
132 |
-
|
133 |
-
Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#model-details).
|
134 |
|
135 |
|
136 |
-
</details>
|
137 |
-
<p> </p>
|
138 |
-
|
139 |
## Uses
|
140 |
|
141 |
*This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
|
142 |
It provides information for anyone considering using the model or who is affected by the model.*
|
143 |
|
144 |
-
|
145 |
-
<details>
|
146 |
-
<summary>Click to expand</summary><br/>
|
147 |
-
|
148 |
-
Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#uses).
|
149 |
|
150 |
</details>
|
151 |
<p> </p>
|
152 |
|
153 |
## Training Data
|
154 |
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
|
155 |
-
|
156 |
-
|
157 |
-
<details>
|
158 |
-
<summary>Click to expand</summary><br/>
|
159 |
|
160 |
-
We trained the 1B1 parameter model on a total of 6 Billion tokens
|
161 |
-
Details are provided in the [paper](https://arxiv.org/).
|
162 |
-
|
163 |
-
</details>
|
164 |
-
</details>
|
165 |
-
<p> </p>
|
166 |
|
167 |
## Risks and Limitations
|
168 |
*This section identifies foreseeable harms and misunderstandings.*
|
169 |
-
|
170 |
-
<details>
|
171 |
-
<summary>Click to expand</summary><br/>
|
172 |
|
173 |
-
|
174 |
-
|
175 |
-
</details>
|
176 |
-
<p> </p>
|
177 |
|
178 |
### Factors
|
179 |
*This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
|
@@ -182,25 +98,16 @@ Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#risks-a
|
|
182 |
|
183 |
- The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
|
184 |
|
185 |
-
</details>
|
186 |
-
<p> </p>
|
187 |
|
188 |
## Recommendations
|
189 |
|
190 |
*This section provides information on warnings and potential mitigations.*
|
191 |
|
192 |
-
|
193 |
-
<details>
|
194 |
-
<summary>Click to expand</summary><br/>
|
195 |
-
|
196 |
-
Please refer to [Model card](https://huggingface.co/bigscience/bloom-1b1#recommendations).
|
197 |
-
|
198 |
-
</details>
|
199 |
-
<p> </p>
|
200 |
|
201 |
|
202 |
## Model Card Authors
|
203 |
*Ordered roughly chronologically and by amount of time spent.*
|
204 |
|
205 |
-
Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yin-Hsiang Liao, Chin-Tung Lin,
|
206 |
<!-- # Bloom_eval -->
|
|
|
7 |
---
|
8 |
|
9 |
<h1 style='text-align: center '>BLOOM-zh</h1>
|
10 |
+
<h2 style='text-align: center '><em>Traditional Chinese-enhanced BLOOM language model</em> </h2>
|
11 |
<h3 style='text-align: center '>Model Card</h3>
|
12 |
|
13 |
Version 1.0 / 20.Feb.2023
|
14 |
|
15 |
+
This model is a joint collaboration between CKIP lab at Acedemia Sinica ([link](https://ckip.iis.sinica.edu.tw/)), MediaTek Research ([連結](https://www.mtkresearch.com/), [连结](https://www.mtkresearch.com/zh-hans/), [link](https://www.mtkresearch.com/en/)), and National Academy for Educational Research ([link](https://www.naer.edu.tw/)).
|
16 |
|
17 |
## Table of Contents
|
18 |
1. [Model Details](#model-details)
|
|
|
26 |
9. [Model Card Authors](#model-card-authors)
|
27 |
|
28 |
## Model Details
|
29 |
+
BLOOM-zh is a language model with enhanced Traditional Chinese capability. It is derived from [BLOOMZ](https://huggingface.co/bigscience/bloomz).
|
30 |
+
BLOOM-zh is trained extendedly on large amount of Traditional Chinese text data.
|
31 |
|
32 |
|
33 |
### Basics
|
|
|
50 |
|
51 |
**Send Questions to:** info@mtkresearch.com
|
52 |
|
53 |
+
**Cite as:** MediaTek Research: Traditional Chinese-enhanced BLOOM language model. International, February 2023.
|
54 |
|
55 |
**Organizations of contributors:**
|
56 |
|
|
|
63 |
### Technical Specifications
|
64 |
*This section provides information for people who work on model development.*
|
65 |
|
66 |
+
For technical specifications, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#model-details).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
|
69 |
### Environmental Impact
|
70 |
|
71 |
+
For environmental impact, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#model-details).
|
|
|
|
|
|
|
72 |
|
73 |
|
|
|
|
|
|
|
74 |
## Uses
|
75 |
|
76 |
*This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
|
77 |
It provides information for anyone considering using the model or who is affected by the model.*
|
78 |
|
79 |
+
For the uses of the model, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#uses).
|
|
|
|
|
|
|
|
|
80 |
|
81 |
</details>
|
82 |
<p> </p>
|
83 |
|
84 |
## Training Data
|
85 |
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
|
|
|
|
|
|
|
|
|
86 |
|
87 |
+
We trained the 1B1 parameter model on a total of 6 Billion tokens of mostly high quality Traditional Chinese text. Details are provided in the [paper](https://arxiv.org/).
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
## Risks and Limitations
|
90 |
*This section identifies foreseeable harms and misunderstandings.*
|
|
|
|
|
|
|
91 |
|
92 |
+
For risks and limitations, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#risks-and-limitations).
|
|
|
|
|
|
|
93 |
|
94 |
### Factors
|
95 |
*This section lists some different aspects of BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
|
|
|
98 |
|
99 |
- The model is trained on web crawled data, news articles, novels, knowledge sources (encyclopedia, education sector) and instructions
|
100 |
|
|
|
|
|
101 |
|
102 |
## Recommendations
|
103 |
|
104 |
*This section provides information on warnings and potential mitigations.*
|
105 |
|
106 |
+
For recommendations, please refer to [BLOOM](https://huggingface.co/bigscience/bloom-1b1#recommendations).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
107 |
|
108 |
|
109 |
## Model Card Authors
|
110 |
*Ordered roughly chronologically and by amount of time spent.*
|
111 |
|
112 |
+
Philipp Ennen, Po-Chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
|
113 |
<!-- # Bloom_eval -->
|