language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: ExLlamaV2
tags:
- safetensors
- mixtral
base_model: R136a1/BeyondInfinity-4x7B
R136a1/BeyondInfinity-4x7B quanted for 12GB VRAM
BF16 model by R136a1
Good character chat model for 12GB VRAM for use with exllamav2.
quants are under different branches.
On Windows with average vram usage:
3.2 fits in 12GB VRAM with 4096 fp16 precision context, and 8192 with Q4 precision context (I'm not sure if model supports any higher)
Linux or Windows if logged out or all background tasks closed:
3.4 fits in 12GB with 4096 fp16 precision context, none higher tested
places you can use model:
- tabbyAPI with SillyTavern or any other OpenAI API compatible interface
- Aphrodite Engine
- ExUI
- oobabooga's Text Gen Webui
- When using the downloader, make sure to format like this: Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb:QuantBranch
- KoboldAI
Uses Alpaca chat format
How to download:
oobabooga's downloader
use something like download-model.py to download with python requests.
Install requirements:
pip install requests tqdm
Example for downloading 3.4bpw:
python download-model.py Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb:3.4bpw
huggingface-cli
You may also use huggingface-cli
To install it, install python hf-hub
pip install huggingface-hub
Example for 3.4bpw:
huggingface-cli download Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb --local-dir BeyondInfinity-4x7B-exl2-3.4bpw --revision 3.4bpw
Git LFS (not recommended)
I would recommend the http downloaders over using git, they can resume downloads if failed and are much easier to work with.
Make sure to have git and git LFS installed.
Example for 3.4bpw download with git:
Have LFS file skip disabled
# windows
set GIT_LFS_SKIP_SMUDGE=0
# linux
export GIT_LFS_SKIP_SMUDGE=0
Clone repo branch
git clone https://huggingface.co/Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb -b 3.4bpw