EleutherAI-gpt-neox-20b-ov-int8
This is the EleutherAI/gpt-neox-20b model converted to OpenVINO, for accelerated inference. Model weights are compressed to INT8 with weight compression using nncf.
Use optimum-intel for inference (documentation).
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.