Edit model card

量化需要使用A100才能完成实验。

原来的大模型:chenshake/Llama-2-7b-chat-hf

转换过程:quantize_llama-2-7b-chat_with_autogptq

目的用来学习。量化后,模型从13G,变成4g左右。

推理的时候,就不需要A100,使用T4就可以。

推理测试

Downloads last month
1
Safetensors
Model size
1.13B params
Tensor type
I32
·
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.