Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

llama-65b-4bit

This works with my branch of GPTQ-for-LLaMa: https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU

To test it out on two RTX4090 GPUs and 64GB RAM (might work with a big swap file haven't tested):

# Install git-lfs
sudo apt install git git-lfs

# Clone the code
git clone https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU
cd GPTQ-for-LLaMa-65B-2GPU

# Clone the model weights
git lfs install
git clone https://huggingface.co/catid/llama-65b-4bit

# Set up conda environment
conda create -n gptq python=3.10
conda activate gptq

# Install script dependencies
pip install -r requirements.txt

# Work around protobuf error
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

# Run a test
python llama_inference.py llama-65b-4bit --load llama-65b-4bit/llama65b-4bit-128g.safetensors --groupsize 128 --wbits 4 --text "I woke up with a dent in my forehead.  " --max_length 128 --min_length 32

license: bsd-3-clause

Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.