XavierSpycy commited on
Commit
c7a4331
1 Parent(s): 30ac85c
Files changed (1) hide show
  1. README.md +257 -257
README.md CHANGED
@@ -1,258 +1,258 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - zh
6
- base_model: meta-llama/Meta-Llama-3-8B-Instruct
7
- tags:
8
- - text-generation
9
- - transformers
10
- - lora
11
- - llama.cpp
12
- - autoawq
13
- - auto-gptq
14
- datasets:
15
- - llamafactory/alpaca_zh
16
- - llamafactory/alpaca_gpt4_zh
17
- ---
18
-
19
- # Meta-Llama-3-8B-Instruct-zh-10k: A Llama🦙 which speaks Chinese / 一只说中文的羊驼🦙
20
-
21
- ## Model Details / 模型细节
22
- This model, <u>`Meta-Llama-3-8B-Instruct-zh-10k`</u>, was fine-tuned from the original [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) due to its underperformance in Chinese. Utilizing the LoRa technology within the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) utilities, this model was adapted to better handle Chinese through three epochs on three corpora: `alpaca_zh`, `alpaca_gpt4_zh`, and `oaast_sft_zh`, amounting to approximately 10,000 examples. This is reflected in the `10k` in its name.
23
-
24
- 由于原模型[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)在中文上表现欠佳,于是该模型 <u>`Meta-Llama-3-8B-Instruct-zh-10k`</u> 微调自此。在[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)工具下,利用LoRa 技术,通过`alpaca_zh`、`alpaca_gpt4_zh`和`oaast_sft_zh`三个语料库上、经过三个训练轮次,我们将该模型调整得更好地掌握了中文。三个语料库共计约10,000个样本,这也是其名字中的 `10k` 的由来。
25
-
26
- For efficient inference, the model was converted to the gguf format using [llama.cpp](https://github.com/ggerganov/llama.cpp) and underwent quantization, resulting in a compact model size of about 3.18 GB, suitable for distribution across various devices.
27
-
28
- 为了高效的推理,使用 [llama.cpp](https://github.com/ggerganov/llama.cpp),我们将该模型转化为了gguf格式并量化,从而得到了一个压缩到约 3.18 GB 大小的模型,适合分发在各类设备上。
29
-
30
- ### LoRa Hardware / LoRa 硬件
31
- - RTX 4090D x 1
32
-
33
- > [!NOTE]
34
- > The complete fine-tuning process took approximately 12 hours. / 完整微调过程花费约12小时。
35
-
36
- Additional fine-tuning configurations are avaiable at [Hands-On LoRa](https://github.com/XavierSpycy/hands-on-lora) or [Llama3Ops](https://github.com/XavierSpycy/llama-ops).
37
-
38
- 更多微调配置可以在我的个人仓库 [Hands-On LoRa](https://github.com/XavierSpycy/hands-on-lora) 或 [Llama3Ops](https://github.com/XavierSpycy/llama-ops) 获得。
39
-
40
- ### Other Models / 其他模型
41
- - <u>LLaMA-Factory</u>
42
- - [Meta-Llama-3-8B-Instruct-zh-10k](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k)
43
-
44
- - <u>AutoAWQ</u>
45
- - [Meta-Llama-3-8B-Instruct-zh-10k-AWQ](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-AWQ)
46
-
47
- - <u>AutoGPTQ</u>
48
- - [Meta-Llama-3-8B-Instruct-zh-10k-GPTQ](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GPTQ)
49
-
50
- ### Model Developer / 模型开发者
51
- - **Pretraining**: Meta
52
- - **Fine-tuning**: [XavierSpycy @ GitHub ](https://github.com/XavierSpycy) | [XavierSpycy @ 🤗](https://huggingface.co/XavierSpycy)
53
-
54
- - **预训练**: Meta
55
- - **微调**: [XavierSpycy @ GitHub](https://github.com/XavierSpycy) | [XavierSpycy @ 🤗 ](https://huggingface.co/XavierSpycy)
56
-
57
-
58
- ### Usage / 用法
59
- This model can be utilized like the original <u>Meta-Llama3</u> but offers enhanced performance in Chinese.
60
-
61
- 我们能够像原版的<u>Meta-Llama3</u>一样使用该模型,而它提供了提升后的中文能力。
62
-
63
- #### 1. How to use in transformers
64
- ```python
65
- # !pip install accelerate
66
-
67
- import torch
68
- from transformers import AutoTokenizer, AutoModelForCausalLM
69
-
70
- model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k"
71
-
72
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
73
- tokenizer = AutoTokenizer.from_pretrained(model_id)
74
-
75
- prompt = "你好,你是谁?"
76
-
77
- messages = [
78
- {"role": "system", "content": "你是一个乐于助人的助手。"},
79
- {"role": "user", "content": prompt}]
80
-
81
- input_ids = tokenizer.apply_chat_template(
82
- messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
83
-
84
- terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
85
-
86
- outputs = model.generate(
87
- input_ids,
88
- max_new_tokens=256,
89
- eos_token_id=terminators,
90
- do_sample=True,
91
- temperature=0.6,
92
- top_p=0.9)
93
-
94
- response = outputs[0][input_ids.shape[-1]:]
95
-
96
- print(tokenizer.decode(response, skip_special_tokens=True))
97
- # 我是一个人工智能助手,旨在帮助用户解决问题和完成任务。
98
- # 我是一个虚拟的人工智能助手,能够通过自然语言处理技术理解用户的需求并为用户提供帮助。
99
- ```
100
-
101
- #### 2. How to use in llama.cpp / 如何在llama.cpp中使用
102
-
103
-
104
- ```python
105
- # CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # -DLLAMA_CUDA=on" \
106
- # pip install llama-cpp-python \
107
- # --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
108
-
109
- # Please download the model weights first. / 请先下载模型权重。
110
-
111
- from llama_cpp import Llama
112
-
113
- llm = Llama(
114
- model_path="/mnt/sdrive/jiarui/Meta-Llama-3-8B-Instruct-zh-10k-GGUF/meta-llama-3-8b-instruct-zh-10k.Q8_0.gguf",
115
- n_gpu_layers=-1)
116
-
117
- # Alternatively / 或者
118
- # llm = Llama.from_pretrained(
119
- # repo_id="XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GGUF",
120
- # filename="*Q8_0.gguf",
121
- # verbose=False
122
- # )
123
-
124
- output = llm(
125
- "Q: 你好,你是谁?A:", # Prompt
126
- max_tokens=256, # Generate up to 32 tokens, set to None to generate up to the end of the context window
127
- stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
128
- echo=True # Echo the prompt back in the output
129
- ) # Generate a completion, can also call create_completion
130
-
131
- print(output['choices'][0]['text'].split("A:")[1].strip())
132
-
133
- # 我是一个人工智能聊天机器人,我的名字叫做“智慧助手”,我由一群程序员设计和开发的。我的主要任务就是通过与您交流来帮助您解决问题,为您提供相关的建议和支持。
134
- ```
135
-
136
- #### 3. How to use with AutoAWQ / 如何与AutoAWQ一起使用
137
- ```python
138
- # !pip install autoawq
139
-
140
- import torch
141
- from transformers import AutoTokenizer, AutoModelForCausalLM
142
-
143
- model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-AWQ"
144
-
145
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
146
- tokenizer = AutoTokenizer.from_pretrained(model_id)
147
-
148
- prompt = "你好,你是谁?"
149
-
150
- messages = [
151
- {"role": "system", "content": "你是一个乐于助人的助手。"},
152
- {"role": "user", "content": prompt}]
153
-
154
- input_ids = tokenizer.apply_chat_template(
155
- messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
156
-
157
- terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
158
-
159
- outputs = model.generate(
160
- input_ids,
161
- max_new_tokens=256,
162
- eos_token_id=terminators,
163
- do_sample=True,
164
- temperature=0.6,
165
- top_p=0.9)
166
-
167
- response = outputs[0][input_ids.shape[-1]:]
168
-
169
- print(tokenizer.decode(response, skip_special_tokens=True))
170
- # 你好!我是一个人工智能助手,我的目的是帮助人们解决问题,回答问题,提供信息和建议。
171
- ```
172
-
173
- #### 4. How to use with AutoGPTQ / 如何与AutoGPTQ一起使用
174
- ```python
175
- # !pip install auto-gptq --no-build-isolation
176
-
177
- import torch
178
- from transformers import AutoTokenizer, AutoModelForCausalLM
179
-
180
- model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GPTQ"
181
-
182
- model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
183
- tokenizer = AutoTokenizer.from_pretrained(model_id)
184
-
185
- prompt = "什么是机器学习?"
186
-
187
- messages = [
188
- {"role": "system", "content": "你是一个乐于助人的助手。"},
189
- {"role": "user", "content": prompt}]
190
-
191
- input_ids = tokenizer.apply_chat_template(
192
- messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
193
-
194
- terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
195
-
196
- outputs = model.generate(
197
- input_ids,
198
- max_new_tokens=256,
199
- eos_token_id=terminators,
200
- do_sample=True,
201
- temperature=0.6,
202
- top_p=0.9)
203
-
204
- response = outputs[0][input_ids.shape[-1]:]
205
-
206
- print(tokenizer.decode(response, skip_special_tokens=True))
207
- # 机器学习是人工智能(AI)的一个分支,它允许计算机从数据中学习并改善其性能。它是一种基于算法的方法,用于从数据中识别模式并进行预测。机器学习算法可以从数据中学习,例如文本、图像和音频,并从中获得知识和见解。
208
- ```
209
-
210
- Further details about the deployment are available in the GitHub repository [Llama3Ops: From LoRa to Deployment with Llama3](https://github.com/XavierSpycy/llama-ops).
211
-
212
- 更多关于部署的细节可以在我的个人仓库 [Llama3Ops: From LoRa to Deployment with Llama3](https://github.com/XavierSpycy/llama-ops) 获得。
213
-
214
- ## Ethical Considerations, Safety & Risks / 伦理考量、安全性和风险
215
- Please refer to [Meta Llama 3's Ethical Considerations](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct#ethical-considerations-and-limitations) for more information. Key points include bias monitoring, responsible usage guidelines, and transparency in model limitations.
216
-
217
- 请参考 [Meta Llama 3's Ethical Considerations](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct#ethical-considerations-and-limitations),以获取更多细节。关键点包括偏见监控、负责任的使用指南和模型限制的透明度。
218
-
219
- ## Limitations / 局限性
220
- - The comprehensive abilities of the model have not been fully tested.
221
-
222
- - While it performs smoothly in Chinese conversations, further benchmarks are required to evaluate its full capabilities. The quality and quantity of the Chinese corpora used may also limit model outputs.
223
-
224
- - Based on current observations, it fundamentally meets the standards in common sense, logic, sentiment analysis, safety, writing, code, and function calls. However, there is room for improvement in role-playing, mathematics, and handling complex tasks with the same text but different meanings.
225
-
226
- - Additionally, catastrophic forgetting in the fine-tuned model has not been evaluated.
227
-
228
- - 该模型的全面的能力尚未全部测试。
229
-
230
- - 尽管它在中文对话中表现流畅,但需要更多的测评以评估其完整的能力。中文语料库的质量和数量可能都会对模型输出有所制约。
231
-
232
- - 根据目前的观察,它在常识、逻辑、情绪分析、安全性、写作、代码和函数调用上基本达标,然而,在角色扮演、数学、复杂的同文异义等任务上有待提高。
233
-
234
- - 另外,微调模型中的灾难性遗忘尚未评估。
235
-
236
- ## Acknowledgements / 致谢
237
- We thank Meta for their open-source contributions, which have greatly benefited the developer community, and acknowledge the collaborative efforts of developers in enhancing this community.
238
-
239
- 我们感谢 Meta 的开源贡献,这极大地帮助了开发者社区,同时,也感谢致力于提升社区的开发者们的努力。
240
-
241
- ## References / 参考资料
242
-
243
- ```
244
- @article{llama3modelcard,
245
- title={Llama 3 Model Card},
246
- author={AI@Meta},
247
- year={2024},
248
- url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}}
249
-
250
- @inproceedings{zheng2024llamafactory,
251
- title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
252
- author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
253
- booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
254
- address={Bangkok, Thailand},
255
- publisher={Association for Computational Linguistics},
256
- year={2024},
257
- url={http://arxiv.org/abs/2403.13372}}
258
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
7
+ tags:
8
+ - text-generation
9
+ - transformers
10
+ - lora
11
+ - llama.cpp
12
+ - autoawq
13
+ - auto-gptq
14
+ datasets:
15
+ - llamafactory/alpaca_zh
16
+ - llamafactory/alpaca_gpt4_zh
17
+ ---
18
+
19
+ # Meta-Llama-3-8B-Instruct-zh-10k: A Llama🦙 which speaks Chinese / 一只说中文的羊驼🦙
20
+
21
+ ## Model Details / 模型细节
22
+ This model, <u>`Meta-Llama-3-8B-Instruct-zh-10k`</u>, was fine-tuned from the original [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) due to its underperformance in Chinese. Utilizing the LoRa technology within the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) utilities, this model was adapted to better handle Chinese through three epochs on three corpora: `alpaca_zh`, `alpaca_gpt4_zh`, and `oaast_sft_zh`, amounting to approximately 10,000 examples. This is reflected in the `10k` in its name.
23
+
24
+ 由于原模型[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)在中文上表现欠佳,于是该模型 <u>`Meta-Llama-3-8B-Instruct-zh-10k`</u> 微调自此。在[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)工具下,利用LoRa 技术,通过`alpaca_zh`、`alpaca_gpt4_zh`和`oaast_sft_zh`三个语料库上、经过三个训练轮次,我们将该模型调整得更好地掌握了中文。三个语料库共计约10,000个样本,这也是其名字中的 `10k` 的由来。
25
+
26
+ For efficient inference, the model was converted to the gguf format using [llama.cpp](https://github.com/ggerganov/llama.cpp) and underwent quantization, resulting in a compact model size of about 3.18 GB, suitable for distribution across various devices.
27
+
28
+ 为了高效的推理,使用 [llama.cpp](https://github.com/ggerganov/llama.cpp),我们将该模型转化为了gguf格式并量化,从而得到了一个压缩到约 3.18 GB 大小的模型,适合分发在各类设备上。
29
+
30
+ ### LoRa Hardware / LoRa 硬件
31
+ - RTX 4090D x 1
32
+
33
+ > [!NOTE]
34
+ > The complete fine-tuning process took approximately 12 hours. / 完整微调过程花费约12小时。
35
+
36
+ Additional fine-tuning configurations are avaiable at [Hands-On LoRa](https://github.com/XavierSpycy/hands-on-lora) or [Llama3Ops](https://github.com/XavierSpycy/llama-ops).
37
+
38
+ 更多微调配置可以在我的个人仓库 [Hands-On LoRa](https://github.com/XavierSpycy/hands-on-lora) 或 [Llama3Ops](https://github.com/XavierSpycy/llama-ops) 获得。
39
+
40
+ ### Other Models / 其他模型
41
+ - <u>LLaMA-Factory</u>
42
+ - [Meta-Llama-3-8B-Instruct-zh-10k](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k)
43
+
44
+ - <u>AutoAWQ</u>
45
+ - [Meta-Llama-3-8B-Instruct-zh-10k-AWQ](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-AWQ)
46
+
47
+ - <u>AutoGPTQ</u>
48
+ - [Meta-Llama-3-8B-Instruct-zh-10k-GPTQ](https://huggingface.co/XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GPTQ)
49
+
50
+ ### Model Developer / 模型开发者
51
+ - **Pretraining**: Meta
52
+ - **Fine-tuning**: [XavierSpycy @ GitHub ](https://github.com/XavierSpycy) | [XavierSpycy @ 🤗](https://huggingface.co/XavierSpycy)
53
+
54
+ - **预训练**: Meta
55
+ - **微调**: [XavierSpycy @ GitHub](https://github.com/XavierSpycy) | [XavierSpycy @ 🤗 ](https://huggingface.co/XavierSpycy)
56
+
57
+
58
+ ### Usage / 用法
59
+ This model can be utilized like the original <u>Meta-Llama3</u> but offers enhanced performance in Chinese.
60
+
61
+ 我们能够像原版的<u>Meta-Llama3</u>一样使用该模型,而它提供了提升后的中文能力。
62
+
63
+ #### 1. How to use in transformers
64
+ ```python
65
+ # !pip install accelerate
66
+
67
+ import torch
68
+ from transformers import AutoTokenizer, AutoModelForCausalLM
69
+
70
+ model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k"
71
+
72
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
73
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
74
+
75
+ prompt = "你好,你是谁?"
76
+
77
+ messages = [
78
+ {"role": "system", "content": "你是一个乐于助人的助手。"},
79
+ {"role": "user", "content": prompt}]
80
+
81
+ input_ids = tokenizer.apply_chat_template(
82
+ messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
83
+
84
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
85
+
86
+ outputs = model.generate(
87
+ input_ids,
88
+ max_new_tokens=256,
89
+ eos_token_id=terminators,
90
+ do_sample=True,
91
+ temperature=0.6,
92
+ top_p=0.9)
93
+
94
+ response = outputs[0][input_ids.shape[-1]:]
95
+
96
+ print(tokenizer.decode(response, skip_special_tokens=True))
97
+ # 我是一个人工智能助手,旨在帮助用户解决问题和完成任务。
98
+ # 我是一个虚拟的人工智能助手,能够通过自然语言处理技术理解用户的需求并为用户提供帮助。
99
+ ```
100
+
101
+ #### 2. How to use in llama.cpp / 如何在llama.cpp中使用
102
+
103
+
104
+ ```python
105
+ # CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # -DLLAMA_CUDA=on" \
106
+ # pip install llama-cpp-python \
107
+ # --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121
108
+
109
+ # Please download the model weights first. / 请先下载模型权重。
110
+
111
+ from llama_cpp import Llama
112
+
113
+ llm = Llama(
114
+ model_path="/path/to/your/model/Meta-Llama-3-8B-Instruct-zh-10k-GGUF/meta-llama-3-8b-instruct-zh-10k.Q8_0.gguf",
115
+ n_gpu_layers=-1)
116
+
117
+ # Alternatively / 或者
118
+ # llm = Llama.from_pretrained(
119
+ # repo_id="XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GGUF",
120
+ # filename="*Q8_0.gguf",
121
+ # verbose=False
122
+ # )
123
+
124
+ output = llm(
125
+ "Q: 你好,你是谁?A:", # Prompt
126
+ max_tokens=256, # Generate up to 32 tokens, set to None to generate up to the end of the context window
127
+ stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
128
+ echo=True # Echo the prompt back in the output
129
+ ) # Generate a completion, can also call create_completion
130
+
131
+ print(output['choices'][0]['text'].split("A:")[1].strip())
132
+
133
+ # 我是一个人工智能聊天机器人,我的名字叫做“智慧助手”,我由一群程序员设计和开发的。我的主要任务就是通过与您交流来帮助您解决问题,为您提供相关的建议和支持。
134
+ ```
135
+
136
+ #### 3. How to use with AutoAWQ / 如何与AutoAWQ一起使用
137
+ ```python
138
+ # !pip install autoawq
139
+
140
+ import torch
141
+ from transformers import AutoTokenizer, AutoModelForCausalLM
142
+
143
+ model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-AWQ"
144
+
145
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
146
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
147
+
148
+ prompt = "你好,你是谁?"
149
+
150
+ messages = [
151
+ {"role": "system", "content": "你是一个乐于助人的助手。"},
152
+ {"role": "user", "content": prompt}]
153
+
154
+ input_ids = tokenizer.apply_chat_template(
155
+ messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
156
+
157
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
158
+
159
+ outputs = model.generate(
160
+ input_ids,
161
+ max_new_tokens=256,
162
+ eos_token_id=terminators,
163
+ do_sample=True,
164
+ temperature=0.6,
165
+ top_p=0.9)
166
+
167
+ response = outputs[0][input_ids.shape[-1]:]
168
+
169
+ print(tokenizer.decode(response, skip_special_tokens=True))
170
+ # 你好!我是一个人工智能助手,我的目的是帮助人们解决问题,回答问题,提供信息和建议。
171
+ ```
172
+
173
+ #### 4. How to use with AutoGPTQ / 如何与AutoGPTQ一起使用
174
+ ```python
175
+ # !pip install auto-gptq --no-build-isolation
176
+
177
+ import torch
178
+ from transformers import AutoTokenizer, AutoModelForCausalLM
179
+
180
+ model_id = "XavierSpycy/Meta-Llama-3-8B-Instruct-zh-10k-GPTQ"
181
+
182
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
183
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
184
+
185
+ prompt = "什么是机器学习?"
186
+
187
+ messages = [
188
+ {"role": "system", "content": "你是一个乐于助人的助手。"},
189
+ {"role": "user", "content": prompt}]
190
+
191
+ input_ids = tokenizer.apply_chat_template(
192
+ messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
193
+
194
+ terminators = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
195
+
196
+ outputs = model.generate(
197
+ input_ids,
198
+ max_new_tokens=256,
199
+ eos_token_id=terminators,
200
+ do_sample=True,
201
+ temperature=0.6,
202
+ top_p=0.9)
203
+
204
+ response = outputs[0][input_ids.shape[-1]:]
205
+
206
+ print(tokenizer.decode(response, skip_special_tokens=True))
207
+ # 机器学习是人工智能(AI)的一个分支,它允许计算机从数据中学习并改善其性能。它是一种基于算法的方法,用于从数据中识别模式并进行预测。机器学习算法可以从数据中学习,例如文本、图像和音频,并从中获得知识和见解。
208
+ ```
209
+
210
+ Further details about the deployment are available in the GitHub repository [Llama3Ops: From LoRa to Deployment with Llama3](https://github.com/XavierSpycy/llama-ops).
211
+
212
+ 更多关于部署的细节可以在我的个人仓库 [Llama3Ops: From LoRa to Deployment with Llama3](https://github.com/XavierSpycy/llama-ops) 获得。
213
+
214
+ ## Ethical Considerations, Safety & Risks / 伦理考量、安全性和风险
215
+ Please refer to [Meta Llama 3's Ethical Considerations](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct#ethical-considerations-and-limitations) for more information. Key points include bias monitoring, responsible usage guidelines, and transparency in model limitations.
216
+
217
+ 请参考 [Meta Llama 3's Ethical Considerations](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct#ethical-considerations-and-limitations),以获取更多细节。关键点包括偏见监控、负责任的使用指南和模型限制的透明度。
218
+
219
+ ## Limitations / 局限性
220
+ - The comprehensive abilities of the model have not been fully tested.
221
+
222
+ - While it performs smoothly in Chinese conversations, further benchmarks are required to evaluate its full capabilities. The quality and quantity of the Chinese corpora used may also limit model outputs.
223
+
224
+ - Based on current observations, it fundamentally meets the standards in common sense, logic, sentiment analysis, safety, writing, code, and function calls. However, there is room for improvement in role-playing, mathematics, and handling complex tasks with the same text but different meanings.
225
+
226
+ - Additionally, catastrophic forgetting in the fine-tuned model has not been evaluated.
227
+
228
+ - 该模型的全面的能力尚未全部测试。
229
+
230
+ - 尽管它在中文对话中表现流畅,但需要更多的测评以评估其完整的能力。中文语料库的质量和数量可能都会对模型输出有所制约。
231
+
232
+ - 根据目前的观察,它在常识、逻辑、情绪分析、安全性、写作、代码和函数调用上基本达标,然而,在角色扮演、数学、复杂的同文异义等任务上有待提高。
233
+
234
+ - 另外,微调模型中的灾难性遗忘尚未评估。
235
+
236
+ ## Acknowledgements / 致谢
237
+ We thank Meta for their open-source contributions, which have greatly benefited the developer community, and acknowledge the collaborative efforts of developers in enhancing this community.
238
+
239
+ 我们感谢 Meta 的开源贡献,这极大地帮助了开发者社区,同时,也感谢致力于提升社区的开发者们的努力。
240
+
241
+ ## References / 参考资料
242
+
243
+ ```
244
+ @article{llama3modelcard,
245
+ title={Llama 3 Model Card},
246
+ author={AI@Meta},
247
+ year={2024},
248
+ url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}}
249
+
250
+ @inproceedings{zheng2024llamafactory,
251
+ title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
252
+ author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
253
+ booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
254
+ address={Bangkok, Thailand},
255
+ publisher={Association for Computational Linguistics},
256
+ year={2024},
257
+ url={http://arxiv.org/abs/2403.13372}}
258
  ```