Edit model card

Platypus2-70B-instruct-4bit-gptq

Platypus2-70B-instruct-4bit-gptq is a qunatnized version of garage-bAInd/Platypus2-70B-instruct using GPTQ Quantnization. This model is only 35 GB in size in comparision with the original garage-bAInd/Platypus2-70B-instruct 127 GB and can run on a single A6000 GPU

Model Details

  • Quantnized by: Mohamad Alhajar
  • Model type: quantnized version of Platypus2-70B-instruct using 4bit quantnization
  • Language(s): English

Prompt Template

### Instruction:

<prompt> (without the <>)

### Response:

Training Dataset

Platypus2-70B-instruct-4bit-gptq quantnized using gptq on Alpaca dataset yahma/alpaca-cleaned.

Training Procedure

garage-bAInd/Platypus2-70B was fine-tuned using gptq on 2 L40 48GB.

How to Get Started with the Model

First install auto_gptq with

pip install auto_gptq

Use the code sample provided in the original post to interact with the model.

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
 
model_id = "malhajar/Platypus2-70B-instruct-4bit-gptq"
model = AutoGPTQForCausalLM.from_quantized(model_id,inject_fused_attention=False,
        use_safetensors=True,
        trust_remote_code=False,
        use_triton=False,
        quantize_config=None)

tokenizer = AutoTokenizer.from_pretrained(model_id)

question: "Who was the first person to walk on the moon?"
# For generating a response
prompt = '''
### Instruction:
{question} 

### Response:'''
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids)
response = tokenizer.decode(output[0])

print(response)

Citations

@article{platypus2023,
    title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, 
    author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz},
    booktitle={arXiv preprint arxiv:2308.07317},
    year={2023}
}
@misc{touvron2023llama,
    title={Llama 2: Open Foundation and Fine-Tuned Chat Models}, 
    author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov       year={2023},
    eprint={2307.09288},
    archivePrefix={arXiv},
}
@misc{frantar2023gptq,
      title={GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers}, 
      author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh},
      year={2023},
      eprint={2210.17323},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
712
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train malhajar/Platypus2-70B-instruct-4bit-gptq

Spaces using malhajar/Platypus2-70B-instruct-4bit-gptq 20