EleutherAI-gpt-neox-20b-ov-int8

This is the EleutherAI/gpt-neox-20b model converted to OpenVINO, for accelerated inference. Model weights are compressed to INT8 with weight compression using nncf.

Use optimum-intel for inference (documentation).

Downloads last month: 9

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.