Edit model card

R136a1/BeyondInfinity-4x7B quanted for 12GB VRAM

BF16 model by R136a1

Good character chat model for 12GB VRAM for use with exllamav2.
quants are under different branches.

On Windows with average vram usage:
3.2 fits in 12GB VRAM with 4096 fp16 precision context, and 8192 with Q4 precision context (I'm not sure if model supports any higher)

Linux or Windows if logged out or all background tasks closed:
3.4 fits in 12GB with 4096 fp16 precision context, none higher tested

places you can use model:

Uses Alpaca chat format

How to download:

oobabooga's downloader

use something like download-model.py to download with python requests.
Install requirements:

pip install requests tqdm

Example for downloading 3.4bpw:

python download-model.py Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb:3.4bpw

huggingface-cli

You may also use huggingface-cli
To install it, install python hf-hub

pip install huggingface-hub

Example for 3.4bpw:

huggingface-cli download Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb --local-dir BeyondInfinity-4x7B-exl2-3.4bpw --revision 3.4bpw

Git LFS (not recommended)

I would recommend the http downloaders over using git, they can resume downloads if failed and are much easier to work with.
Make sure to have git and git LFS installed.
Example for 3.4bpw download with git:

Have LFS file skip disabled

# windows
set GIT_LFS_SKIP_SMUDGE=0
# linux
export GIT_LFS_SKIP_SMUDGE=0

Clone repo branch

git clone https://huggingface.co/Anthonyg5005/BeyondInfinity-4x7B-exl2-12gb -b 3.4bpw
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) does not yet support ExLlamaV2 models for this pipeline type.

Finetuned from