Edit model card

EleutherAI-gpt-neox-20b-ov-int8

This is the EleutherAI/gpt-neox-20b model converted to OpenVINO, for accelerated inference. Model weights are compressed to INT8 with weight compression using nncf.

Use optimum-intel for inference (documentation).

Downloads last month
20
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.