---
license: other
license_name: gemma
license_link: https://ai.google.dev/gemma/prohibited_use_policy
---
# Gemma-7B in 8-bit with bitsandbytes

This is the repository for [Gemma-7B-it](https://huggingface.co/google/gemma-7b-it) quantized to 8-bit using bitsandbytes.
Original model card and license for Gemma-7B can be found [here](https://huggingface.co/google/gemma-7b-it#gemma-model-card).
This is the base model and it's not instruction fine-tuned.

## Usage

Please visit original Gemma-7B-it [model card](https://huggingface.co/google/gemma-7b-it#usage-and-limitations) for intended uses and limitations.

You can use this model like following: 

```python
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
  "merve/gemma-7b-it-8bit"
)
from transformers import AutoTokenizer
tokenizer =AutoTokenizer.from_pretrained(
  "google/gemma-7b-it"
)
#outputs = model.generate(**input_ids)
chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
tokenizer.decode(outputs[0])
```