Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,365 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
# Model Card for Model ID
|
7 |
|
8 |
-
|
9 |
|
|
|
|
|
10 |
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
-
|
|
|
|
|
37 |
|
38 |
-
|
39 |
|
40 |
-
###
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
-
|
45 |
|
46 |
-
|
47 |
|
48 |
-
|
|
|
|
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
-
|
|
|
|
|
55 |
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
-
## How to Get Started with the Model
|
71 |
-
|
72 |
-
Use the code below to get started with the model.
|
73 |
-
|
74 |
-
[More Information Needed]
|
75 |
-
|
76 |
-
## Training Details
|
77 |
-
|
78 |
-
### Training Data
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
|
97 |
-
|
98 |
|
99 |
-
|
100 |
|
101 |
-
|
102 |
|
103 |
-
|
104 |
|
105 |
-
|
106 |
|
107 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
|
109 |
-
|
110 |
|
111 |
-
|
|
|
|
|
|
|
|
|
112 |
|
113 |
-
|
|
|
|
|
|
|
|
|
|
|
114 |
|
115 |
-
|
|
|
116 |
|
117 |
-
|
118 |
|
119 |
-
|
120 |
|
121 |
-
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
126 |
|
127 |
-
|
128 |
|
129 |
-
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
|
137 |
-
<!--
|
138 |
|
139 |
-
|
140 |
|
141 |
-
##
|
142 |
|
143 |
-
|
144 |
|
145 |
-
|
146 |
|
147 |
-
-
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
|
153 |
-
## Technical Specifications [optional]
|
154 |
|
155 |
-
###
|
156 |
|
157 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
158 |
|
159 |
### Compute Infrastructure
|
160 |
|
161 |
-
|
162 |
|
163 |
#### Hardware
|
164 |
|
165 |
-
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
|
171 |
-
|
172 |
|
173 |
-
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
|
195 |
-
|
196 |
|
197 |
-
|
198 |
|
199 |
-
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- nlp
|
5 |
+
- phi3
|
6 |
+
license: mit
|
7 |
+
datasets:
|
8 |
+
- HuggingFaceFW/fineweb-edu
|
9 |
+
language:
|
10 |
+
- en
|
11 |
---
|
12 |
|
|
|
13 |
|
14 |
+
# Model Card for devngho/phi-3-mini-4k-base
|
15 |
|
16 |
+
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
17 |
+
Finetuned from microsoft/Phi-3-mini-4k-instruct on fineweb-edu (about 150,000 samples).
|
18 |
|
19 |
+
Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/about/). ⚡
|
20 |
|
21 |
## Model Details
|
22 |
|
23 |
+
- **Developed by:** devngho
|
24 |
+
- **Language(s) (NLP):** en
|
25 |
+
- **License:** mit
|
26 |
+
- **Parent Model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
|
27 |
+
|
28 |
+
## Samples
|
29 |
+
|
30 |
+
- max_tokens: 256
|
31 |
+
- \<s> token is omitted
|
32 |
+
- temperature: 0
|
33 |
+
|
34 |
+
### Sample 1
|
35 |
+
Prompt:
|
36 |
+
```md
|
37 |
+
Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
|
38 |
+
```
|
39 |
+
|
40 |
+
<details>
|
41 |
+
<summary>
|
42 |
+
This model(devngho/phi-3-mini-4k-base):
|
43 |
+
</summary>
|
44 |
+
<p>
|
45 |
+
Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
|
46 |
+
|
47 |
+
A. He was inspired by the Chinese writing system.
|
48 |
+
B. He was inspired by the Japanese writing system.
|
49 |
+
C. He was inspired by the Mongolian writing system.
|
50 |
+
D. He was inspired by the Korean writing system.
|
51 |
+
|
52 |
+
- response: King Sejong the Great, the fourth king of the Joseon Dynasty in Korea, created Hangul in the 15th century with the aim of promoting literacy among the common people. Prior to Hangul, the Korean writing system was based on Classical Chinese characters, which were complex and difficult for the general population to learn.
|
53 |
+
|
54 |
+
King Sejong and his scholars were inspired by the need to create a writing system that was easy to learn and use, and that could accurately represent the sounds of the Korean language. They were not directly inspired by the Chinese, Japanese, or Mongolian writing systems, but rather by the desire to create a system that was uniquely Korean and accessible to all
|
55 |
+
</p>
|
56 |
+
</details>
|
57 |
+
<details>
|
58 |
+
<summary>
|
59 |
+
Original model(microsoft/Phi-3-mini-4k-instruct):
|
60 |
+
</summary>
|
61 |
+
<p>
|
62 |
+
Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?<|end|>
|
63 |
+
</p>
|
64 |
+
</details>
|
65 |
+
<details>
|
66 |
+
<summary>
|
67 |
+
Other model(gpt-3.5-turbo-instruct - completions model):
|
68 |
+
</summary>
|
69 |
+
<p>
|
70 |
+
Korean is written with an alphabet called Hangul, which was created in the 15th century by King Sejong the Great and his scholars. How King Sejong the Great created Hangul?
|
71 |
+
|
72 |
+
King Sejong the Great wanted to create a writing system that was easy to learn and use for the common people, as the existing writing system at the time, Chinese characters, were difficult to learn and only accessible to the elite. He believed that a writing system that was easy to learn would improve literacy rates and promote education among the common people.
|
73 |
+
|
74 |
+
To create Hangul, King Sejong the Great gathered a group of scholars and linguists and tasked them with creating a new writing system. They studied the sounds of the Korean language and based the alphabet on the shape and position of the mouth when making these sounds.
|
75 |
+
|
76 |
+
The alphabet was initially called Hunminjeongeum, which means "the correct sounds for the instruction of the people." It consisted of 28 letters, 17 consonants and 11 vowels, and was designed to be simple and easy to learn. The letters were also organized into syllable blocks, making it easier to read and write.
|
77 |
+
|
78 |
+
Hangul was officially promulgated in 1446 and was met with resistance from the elite who saw it as a threat to their status. However, King Sejong the Great continued to promote and encourage the use of Hangul, and it eventually became the primary writing system in Korea.
|
79 |
+
|
80 |
+
Today, Hangul is
|
81 |
+
</p>
|
82 |
+
</details>
|
83 |
+
|
84 |
+
Author's Note: It's still seems to be instruct-following model, but the response is more better than the original model as it doesn't have the <|end|> token.
|
85 |
+
|
86 |
+
### Sample 2
|
87 |
+
Prompt:
|
88 |
+
```md
|
89 |
+
South Korea has experienced many wars in modern times. For instance,
|
90 |
+
```
|
91 |
+
|
92 |
+
<details>
|
93 |
+
<summary>
|
94 |
+
This model(devngho/phi-3-mini-4k-base):
|
95 |
+
</summary>
|
96 |
+
<p>
|
97 |
+
South Korea has experienced many wars in modern times. For instance, the Korean War (1950-1953) was a conflict between North Korea and South Korea, with the United States and China supporting the former and the latter, respectively. The war ended in a stalemate, with the two Koreas remaining divided along the 38th parallel.
|
98 |
+
|
99 |
+
The Korean War had a significant impact on the country's economy. The war caused widespread destruction, with many cities and towns being destroyed. The war also disrupted trade and caused a shortage of resources. The South Korean government had to rely on foreign aid to rebuild the country.
|
100 |
+
|
101 |
+
One of the most significant impacts of the Korean War on the economy was the loss of human capital. Many young men were killed or injured during the war, and their families had to bear the burden of their loss. The war also caused a brain drain, with many skilled workers and professionals leaving the country to seek better opportunities abroad.
|
102 |
+
|
103 |
+
The Korean War also had a significant impact on the country's agriculture. The war disrupted the supply chain, and many farmers were unable to sell their produce. The government had to import
|
104 |
+
</p>
|
105 |
+
</details>
|
106 |
+
<details>
|
107 |
+
<summary>
|
108 |
+
Original model(microsoft/Phi-3-mini-4k-instruct):
|
109 |
+
</summary>
|
110 |
+
<p>
|
111 |
+
South Korea has experienced many wars in modern times. For instance, the Korean War (1950-1953) was a significant conflict that involved North and South Korea, as well as international forces. The war ended with an armistice agreement, but no formal peace treaty was signed, leaving the two Koreas technically still at war.
|
112 |
+
|
113 |
+
In addition to the Korean War, South Korea has faced other conflicts and tensions, such as the Korean DMZ Conflict (1966-1969) and the Jeju Uprising (1948). These events have shaped South Korea's history and continue to influence its relationships with neighboring countries.
|
114 |
+
|
115 |
+
Despite these conflicts, South Korea has made significant progress in terms of economic development, technological advancements, and cultural influence. The country has become a major player in the global economy, with a strong focus on industries such as electronics, automotive, and entertainment.
|
116 |
+
|
117 |
+
South Korea's success can be attributed to various factors, including its emphasis on education, innovation, and a strong work ethic. The country has also invested heavily in infrastructure and technology, which has
|
118 |
+
</p>
|
119 |
+
</details>
|
120 |
+
<details>
|
121 |
+
<summary>
|
122 |
+
Other model(gpt-3.5-turbo-instruct - completions model):
|
123 |
+
</summary>
|
124 |
+
<p>
|
125 |
+
South Korea has experienced many wars in modern times. For instance, the country was involved in the Korean War from 1950 to 1953, which was a conflict between North and South Korea. This war resulted in millions of casualties and the division of the Korean peninsula into two separate countries.
|
126 |
+
|
127 |
+
In addition, South Korea was also involved in the Vietnam War from 1964 to 1973, where it sent over 300,000 troops to support the United States and South Vietnam against North Vietnam and the Viet Cong. This war also resulted in thousands of casualties for South Korea.
|
128 |
+
|
129 |
+
In more recent times, South Korea has also faced tensions and conflicts with North Korea, including the Korean DMZ Conflict in 1966, the Korean Axe Murder Incident in 1976, and the Yeonpyeong Island shelling in 2010. These conflicts have resulted in casualties and heightened tensions between the two countries.
|
130 |
+
|
131 |
+
South Korea has also been involved in international conflicts, such as the Gulf War in 1991 and the War in Afghanistan from 2001 to 2014, where it sent troops to support the United States and its allies.
|
132 |
+
|
133 |
+
Overall, South Korea has a long history of experiencing wars and conflicts, both domestically and internationally. However, the country has also made efforts to promote peace and reconciliation, such as through the Sunshine Policy
|
134 |
+
</p>
|
135 |
+
</details>
|
136 |
+
|
137 |
+
Author's Note: These responses are very similar.
|
138 |
+
|
139 |
+
### Sample 3
|
140 |
+
Prompt:
|
141 |
+
```md
|
142 |
+
Hangul is a writing system made by King Sejong in
|
143 |
+
```
|
144 |
+
|
145 |
+
<details>
|
146 |
+
<summary>
|
147 |
+
This model(devngho/phi-3-mini-4k-base):
|
148 |
+
</summary>
|
149 |
+
<p>
|
150 |
+
Hangul is a writing system made by King Sejong in 1443. It is the official writing system of Korea. Hangul is a phonetic alphabet, which means that the letters represent the sounds of the Korean language.
|
151 |
|
152 |
+
## The Creation of Hangul
|
153 |
|
154 |
+
King Sejong wanted to create a writing system that was easy to learn and use for the common people. He believed that the existing writing system, which was based on Chinese characters, was too difficult for the average person to learn.
|
155 |
+
|
156 |
+
## The Structure of Hangul
|
157 |
+
|
158 |
+
Hangul is made up of 14 consonants and 10 vowels. These letters are combined to form syllables, which are the building blocks of the Korean language. The consonants and vowels are arranged in a square-like structure, with each row and column representing a different sound.
|
159 |
+
|
160 |
+
## The Importance of Hangul
|
161 |
|
162 |
+
Hangul is an important part of Korean culture and identity. It is used in all aspects of life, from education to business to entertainment. The Korean government has made efforts to promote the use of Hangul and preserve its cultural significance.
|
|
|
|
|
|
|
|
|
|
|
|
|
163 |
|
164 |
+
</p>
|
165 |
+
</details>
|
166 |
+
<details>
|
167 |
+
<summary>
|
168 |
+
Original model(microsoft/Phi-3-mini-4k-instruct):
|
169 |
+
</summary>
|
170 |
+
<p>
|
171 |
+
Hangul is a writing system made by King Sejong in the 15th century.
|
172 |
+
<|assistant|>
|
173 |
+
</p>
|
174 |
+
</details>
|
175 |
+
<details>
|
176 |
+
<summary>
|
177 |
+
Other model(gpt-3.5-turbo-instruct - completions model):
|
178 |
+
</summary>
|
179 |
+
<p>
|
180 |
+
Hangul is a writing system made by King Sejong in 1443. It is the official writing system of Korea and is used to write the Korean language. Hangul is unique in that it is a phonetic alphabet, meaning that each letter represents a specific sound. This makes it relatively easy to learn and use compared to other writing systems.
|
181 |
|
182 |
+
Hangul is made up of 14 consonants and 10 vowels, which can be combined to form syllables. The letters are arranged in a block-like structure, with each block representing a syllable. The consonants are written in a vertical line on the left side, while the vowels are written in a horizontal line on the top or bottom of the consonant.
|
183 |
|
184 |
+
One of the main reasons King Sejong created Hangul was to increase literacy among the common people. Before Hangul, the Korean language was written using Chinese characters, which were difficult to learn and only accessible to the elite. Hangul made it easier for people to learn how to read and write, leading to a significant increase in literacy rates in Korea.
|
|
|
|
|
185 |
|
186 |
+
Hangul has also played a significant role in preserving the Korean language and culture. During the Japanese occupation of Korea in the early 20th century, the use of Hangul was banned, and the Korean language was suppressed. However, after Korea gained independence,
|
187 |
+
</p>
|
188 |
+
</details>
|
189 |
|
190 |
+
Author's Note: Unlike the other samples, the model's response seems to be similar to general articles or gpt-3.5-turbo-instruct, not chat-style, as I hoped.
|
191 |
|
192 |
+
### Sample 4
|
193 |
+
Prompt:
|
194 |
+
```md
|
195 |
+
TPU(Tensor Processing Unit) was created by
|
196 |
+
```
|
197 |
|
198 |
+
<details>
|
199 |
+
<summary>
|
200 |
+
This model(devngho/phi-3-mini-4k-base):
|
201 |
+
</summary>
|
202 |
+
<p>
|
203 |
+
TPU(Tensor Processing Unit) was created by Google to accelerate the training of deep neural networks. TPUs are designed to be highly efficient for the matrix operations that are common in deep learning. They are similar to GPUs but are optimized for TensorFlow, Google's machine learning framework.
|
204 |
|
205 |
+
Here's a simple example of how you can use TPUs in TensorFlow to train a neural network:
|
206 |
|
207 |
+
First, you need to install the TensorFlow package with TPU support:
|
208 |
|
209 |
+
```bash
|
210 |
+
pip install tensorflow-gpu
|
211 |
+
```
|
212 |
|
213 |
+
Then, you can set up your TPU environment:
|
214 |
|
215 |
+
```python
|
216 |
+
import tensorflow as tf
|
217 |
+
|
218 |
+
# Check if TPU is available
|
219 |
+
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
|
220 |
+
print('Running on TPU ', tpu.master())
|
221 |
|
222 |
+
# Initialize the TPU
|
223 |
+
tf.config.experimental_connect_to_cluster(tpu)
|
224 |
+
tf.tpu.experimental.initialize_tpu_system(tpu)
|
225 |
|
226 |
+
# Create a strategy for TPU distribution
|
227 |
+
strategy = tf.distribute.experimental.TPUStrategy(
|
228 |
+
```
|
229 |
+
</p>
|
230 |
+
</details>
|
231 |
+
<details>
|
232 |
+
<summary>
|
233 |
+
Original model(microsoft/Phi-3-mini-4k-instruct):
|
234 |
+
</summary>
|
235 |
+
<p>
|
236 |
+
TPU(Tensor Processing Unit) was created by Google.
|
237 |
|
238 |
+
> TPU is a Google-designed accelerator that is optimized for
|
239 |
+
> training and inference of machine learning models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
240 |
|
241 |
+
I'm wondering if there is any open source implementation of TPU?
|
242 |
|
243 |
+
I've found this one: https://github.com/tensorflow/models/tree/master/official/transformer
|
244 |
|
245 |
+
But it seems to be a very small part of TPU.
|
246 |
|
247 |
+
- [Response]: Tensor Processing Units (TPUs) are custom-designed hardware accelerators developed by Google to efficiently run machine learning workloads, particularly those involving neural networks. While TPUs are proprietary and not open-source hardware, there are open-source software implementations that can run on TPUs or similar hardware architectures.
|
248 |
|
249 |
+
The TensorFlow ecosystem, which is Google's open-source machine learning framework, has made significant strides in providing support for TPUs. TensorFlow provides a TPU-optimized version of its runtime, which can be used to run TensorFlow models on TPUs.
|
250 |
|
251 |
+
Here are some open-source implementations and resources related to TPU
|
252 |
+
</p>
|
253 |
+
</details>
|
254 |
+
<details>
|
255 |
+
<summary>
|
256 |
+
Other model(gpt-3.5-turbo-instruct - completions model):
|
257 |
+
</summary>
|
258 |
+
<p>
|
259 |
+
TPU(Tensor Processing Unit) was created by Google for the purpose of accelerating machine learning workloads. It is a specialized chip designed specifically for deep learning tasks, such as training and inference of neural networks.
|
260 |
+
|
261 |
+
TPUs are highly efficient and can provide significant speedups compared to traditional CPUs and GPUs. They are optimized for matrix multiplication, which is a key operation in neural network computations. This allows them to process large amounts of data in parallel, making them ideal for training and running complex models.
|
262 |
+
|
263 |
+
One of the main advantages of TPUs is their scalability. They can be used individually or in clusters, allowing for even faster processing of large datasets. They also have a high memory bandwidth, which enables them to handle large models and datasets without running into memory constraints.
|
264 |
+
|
265 |
+
Another key feature of TPUs is their low power consumption. They are designed to be energy-efficient, making them a more environmentally friendly option compared to traditional hardware.
|
266 |
+
|
267 |
+
TPUs are primarily used in Google's cloud computing platform, Google Cloud, but they can also be accessed through Google Colab, a free online platform for running machine learning experiments. This makes TPUs accessible to a wider range of users, including researchers and developers who may not have access to specialized hardware.
|
268 |
+
|
269 |
+
Overall, TPUs have revolutionized the field of deep learning by providing a powerful and efficient hardware solution
|
270 |
+
</p>
|
271 |
+
</details>
|
272 |
|
273 |
+
Author's Note: The model provides a simple example of how to use TPUs in TensorFlow, which is a good response. Original model is Q&A style.
|
274 |
|
275 |
+
### Sample 5
|
276 |
+
Prompt:
|
277 |
+
```md
|
278 |
+
Hello!
|
279 |
+
```
|
280 |
|
281 |
+
<details>
|
282 |
+
<summary>
|
283 |
+
This model(devngho/phi-3-mini-4k-base):
|
284 |
+
</summary>
|
285 |
+
<p>
|
286 |
+
Hello! I'm your friendly AI chatbot, here to help you with your homework. Could you please tell me which subject and topic you need assistance with?
|
287 |
|
288 |
+
## Conversation
|
289 |
+
**User**: Hey there! I'm working on a project for my history class, but it's a bit of a mix. We're looking at the impact of technology on society, but it's got to tie in with a historical event. Any ideas?
|
290 |
|
291 |
+
**Chatbot**: That sounds like an interesting project! How about exploring the impact of the printing press on the Reformation? The printing press was a significant technological advancement that greatly influenced the spread of ideas during the Reformation.
|
292 |
|
293 |
+
**User**: That's a good start, but I was thinking more along the lines of something more recent, like the internet.
|
294 |
|
295 |
+
**Chatbot**: Understood. You could examine the role of the internet in the Arab Spring. The widespread use of social media and the internet played a crucial role in organizing protests and disseminating information during that period.
|
296 |
|
297 |
+
**User**: Yeah, that's more like it. But I'
|
298 |
+
</p>
|
299 |
+
</details>
|
300 |
+
<details>
|
301 |
+
<summary>
|
302 |
+
Original model(microsoft/Phi-3-mini-4k-instruct):
|
303 |
+
</summary>
|
304 |
+
<p>
|
305 |
+
Hello! I'd be happy to help you with your SQL queries. Could you please describe your data or provide the table structure you're working with?
|
306 |
|
307 |
+
<|assistant|>
|
308 |
+
</p>
|
309 |
+
</details>
|
310 |
+
<details>
|
311 |
+
<summary>
|
312 |
+
Other model(gpt-3.5-turbo-instruct - completions model):
|
313 |
+
</summary>
|
314 |
+
<p>
|
315 |
+
Hello! I am a 22 year old female who is looking for a room to rent in the city of Toronto. I am a recent university graduate and will be starting a full-time job in the downtown area in September. I am a clean, responsible, and friendly individual who enjoys cooking, reading, and exploring the city. I am looking for a room in a shared house or apartment with other young professionals or students. My budget is around $800-1000 per month. Please contact me if you have a room available. Thank you!
|
316 |
+
</p>
|
317 |
+
</details>
|
318 |
|
319 |
+
Author's Note: The model's response is still instruct-following, it just doesn't use the <|assistant|> token. In contrast, gpt-3.5-turbo-instruct's response looks a normal introduction.
|
320 |
|
321 |
+
## Bias, Risks, and Limitations
|
|
|
|
|
|
|
|
|
|
|
|
|
322 |
|
323 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
324 |
|
325 |
+
Since the base model is a instruct-following model, sometimes the model may generate chat-like responses (See Samples).
|
326 |
|
327 |
+
## Training Details
|
328 |
|
329 |
+
### Training Data
|
330 |
|
331 |
+
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
332 |
|
333 |
+
[fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) (about 150,000 samples)
|
|
|
|
|
|
|
|
|
334 |
|
|
|
335 |
|
336 |
+
### Training Procedure
|
337 |
|
338 |
+
- batch_size: 64
|
339 |
+
- lr: 3e-5
|
340 |
+
- lr_scheduler: cosine
|
341 |
+
- torch_dtype: bfloat16
|
342 |
+
- warmup_ratio: 0.2
|
343 |
+
- optimizer: adamw
|
344 |
+
- seed: 42
|
345 |
+
- gradient_accumulation: 1
|
346 |
+
- gradient_checkpointing: true
|
347 |
+
- FSDPv2 (FSDP via SPMD)
|
348 |
|
349 |
### Compute Infrastructure
|
350 |
|
351 |
+
Google Cloud TPU
|
352 |
|
353 |
#### Hardware
|
354 |
|
355 |
+
TPU v4-32, took ~12 hours.
|
|
|
|
|
|
|
|
|
356 |
|
357 |
+
Research supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/about/). ⚡
|
358 |
|
359 |
+
#### Software
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
360 |
|
361 |
+
transformers\~=4.41.2 torch\~=2.3.0 torch_xla\[tpu]\~=2.3.0
|
362 |
|
363 |
+
### Train Results
|
364 |
|
365 |
+
- train/loss: 2.22385830132309
|