量化需要使用A100才能完成实验。

原来的大模型：chenshake/Llama-2-7b-chat-hf

转换过程：quantize_llama-2-7b-chat_with_autogptq

目的用来学习。量化后，模型从13G，变成4g左右。

推理的时候，就不需要A100，使用T4就可以。

推理测试

Downloads last month: 1

Safetensors

Model size

1.13B params

Tensor type

I32

FP16

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.