See: https://huggingface.co/01-ai/Yi-34B-200K

Yi-30B-200K quantized to 3.9bpw, which should allow for ~50K context on 24GB GPUs. Ask if you need another size.

Quantized with 8K rows on a mix of wikitext, prompt formatting, and my own RP stories.

Use with --enable-remote-code in text-gen-ui. Load with Exllamav2_HF, use 8-bit cache, and disable the fast_tokenizer option. The TFS preset seems to work well with Yi.


license: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE

Downloads last month
14
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.