This is a GPTQ quantized version of airo-llongma-2-13B-16k

To run this model, make sure compress_pos_emb is set to 4 to apply proper rope scaling parameters. The max_ctx_len is 16384.

Branches:

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support