alokabhishek
/

Llama-2-7b-chat-hf-bnb-8bit

Text Generation

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

alokabhishek commited on Apr 2

Commit

cdf78df

•

1 Parent(s): 4ea86da

Updated Readme

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -26,14 +26,14 @@ This repo contains 8-bit quantized (using bitsandbytes) model of Meta's meta-lla
 - Original model: [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
-### About 4 bit quantization using bitsandbytes
-QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
-Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
-bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)
 # How to Get Started with the Model

 - Original model: [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
+### About 8 bit quantization using bitsandbytes
+- QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
+- Hugging Face Blog post on 8-bit quantization using bitsandbytes: [A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes](https://huggingface.co/blog/hf-bitsandbytes-integration)
+- bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)
 # How to Get Started with the Model