Updates
Add arg โuse_safetensors=Falseโ in from_quanted(), while this arg is set to False as defauly in previous Auto-GPTQ. If there are any problems to load model directly by HF, someone can try git clone. (Dec 15๏ผ 2023)
Description
This repo contains int4 model(GPTQ) for AceGPT-7B-Chat.
The performance of the int4 version has experienced some degradation. For a better user experience, please use the fp16 version. For details, see AceGPT-7B-Chat and AceGPT-13B-Chat.
How to use this GPTQ model from Python code
Install the necessary packages
Requires: Transformers 4.32.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
pip3 install transformers>=4.32.0 optimum>=1.12.0 #See requirements.py for verified versions.
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
You can then generate a simple gradioweb with_quant.py
python web_quant.py --model-name ${model-path}
You can get more details at https://github.com/FreedomIntelligence/AceGPT/tree/main
- Downloads last month
- 156
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.