See: https://huggingface.co/01-ai/Yi-34B-200K
Yi-30B-200K quantized to 3.9bpw, which should allow for ~50K context on 24GB GPUs. Ask if you need another size.
Quantized with 8K rows on a mix of wikitext, prompt formatting, and my own RP stories.
Use with --enable-remote-code in text-gen-ui. Load with Exllamav2_HF, use 8-bit cache, and disable the fast_tokenizer
option. The TFS preset seems to work well with Yi.
license: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
- Downloads last month
- 14
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support model that require custom code execution.