YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Llama3-70B-Chinese-Chat-AWQ-32k

Model Description

This repository provides a 4-bit AWQ quantized version based on the full-parameter fine-tuned Llama3-70B-Chinese-Chat model by shenzhi-wang (https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat). The original model is based on the Llama3-70B model, which has been fine-tuned for Chinese chat tasks to enhance its ability to handle Chinese dialogue tasks. Additionally, we have included an optional configuration file to support extending the context length from the original 8k to 32k. This enables the model to process longer text sequences, making it suitable for scenarios that require richer contextual information.

Quantization

We have used 4-bit AWQ quantization technology to reduce the model's weight precision. Preliminary tests show that the model's performance has been maintained relatively well. The quantized model can run in environments with limited resources.

Context Extension

To support longer contexts, we have added a configuration file named "config-32k.json". When you need to process text lengths that exceed the original context limit, you can enable this feature by simply replacing the configuration file. Please note that as this is an experimental feature, using longer context lengths may affect the model's performance. It is recommended that you test based on your actual usage scenarios. (By default, the original "config.json" from the Llama3 is used, which has an 8k context. To enable the 32k context length, replace the "config.json" in the model files with "config-32k.json". The effect is uncertain; please test yourself.)

Original Model Link

https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat Thanks to the open-source community for their contributions to the Sinicization of Llama3.


模型描述

本仓库提供了在shenzhi-wang全参数微调的Llama3-70B-Chinese-Chat基础上进行的4位AWQ量化版本。 原始模型是基于Llama3-70B模型,在中文聊天任务上进行了微调,以提升其在处理中文对话任务的能力。 此外,我们还增加了一个可选的配置文件,以支持将上下文长度从原始的8k扩展至32k。这样,模型可以处理更长的文本序列,适用于需要上下文信息更丰富的场景。

量化

我们使用了4位AWQ量化技术来降低模型的权重精度,初步试用中,模型性能保持得还不错。量化后的模型可以在资源有限的环境中运行。

上下文扩展

为了支持更长的上下文,我们增加了一个名为config-32k.json的配置文件。当您需要处理的文本长度超过原始上下文限制时,可以通过简单地替换配置文件来启用这一特性。 请注意,由于这是一个实验性的特性,使用更长上下文长度可能会影响模型的性能,建议您根据实际使用场景进行测试。 (默认使用llama3原版"config.json",具有8k上下文。若要启用32k上下文长度,请将模型文件中的"config.json"替换为"config-32k.json",效果不确定,请自行测试)

原始模型链接

https://huggingface.co/shenzhi-wang/Llama3-70B-Chinese-Chat 感谢开源社区对llama3中文化做出的贡献。

Downloads last month
12
Safetensors
Model size
11.3B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.