bujido
/

Llama3-70B-Chinese-Chat-AWQ-32k

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

bujido commited on May 18

Commit

8a9db37

•

1 Parent(s): 4aad6ad

Upload README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -1,6 +1,25 @@
 # Llama3-70B-Chinese-Chat-AWQ-32k
 ## 模型描述
 本仓库提供了在[shenzhi-wang全参数微调的Llama3-70B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat)基础上进行的4位AWQ量化版本。
 原始模型是基于Llama3-70B模型，在中文聊天任务上进行了微调，以提升其在处理中文对话任务的能力。

 # Llama3-70B-Chinese-Chat-AWQ-32k
+## Model Description
+This repository provides a 4-bit AWQ quantized version based on the full-parameter fine-tuned Llama3-70B-Chinese-Chat model by shenzhi-wang (https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat).
+The original model is based on the Llama3-70B model, which has been fine-tuned for Chinese chat tasks to enhance its ability to handle Chinese dialogue tasks.
+Additionally, we have included an optional configuration file to support extending the context length from the original 8k to 32k. This enables the model to process longer text sequences, making it suitable for scenarios that require richer contextual information.
+### Quantization
+We have used 4-bit AWQ quantization technology to reduce the model's weight precision. Preliminary tests show that the model's performance has been maintained relatively well. The quantized model can run in environments with limited resources.
+### Context Extension
+To support longer contexts, we have added a configuration file named "config-32k.json". When you need to process text lengths that exceed the original context limit, you can enable this feature by simply replacing the configuration file.
+Please note that as this is an experimental feature, using longer context lengths may affect the model's performance. It is recommended that you test based on your actual usage scenarios.
+(By default, the original "config.json" from the Llama3 is used, which has an 8k context. To enable the 32k context length, replace the "config.json" in the model files with "config-32k.json". The effect is uncertain; please test yourself.)
+## Original Model Link
+https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat
+Thanks to the open-source community for their contributions to the Sinicization of Llama3.
+--------------------------------------
 ## 模型描述
 本仓库提供了在[shenzhi-wang全参数微调的Llama3-70B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat)基础上进行的4位AWQ量化版本。
 原始模型是基于Llama3-70B模型，在中文聊天任务上进行了微调，以提升其在处理中文对话任务的能力。