--- license: other license_name: gemma license_link: https://ai.google.dev/gemma/prohibited_use_policy --- # Gemma-7B in 8-bit with bitsandbytes This is the repository for [Gemma-7B-it](https://huggingface.co/google/gemma-7b-it) quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found [here](https://huggingface.co/google/gemma-7b-it#gemma-model-card). This is the base model and it's not instruction fine-tuned. ## Usage Please visit original Gemma-7B-it [model card](https://huggingface.co/google/gemma-7b-it#usage-and-limitations) for intended uses and limitations. You can use this model like following: ```python from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained( "merve/gemma-7b-it-8bit" ) from transformers import AutoTokenizer tokenizer =AutoTokenizer.from_pretrained( "google/gemma-7b-it" ) #outputs = model.generate(**input_ids) chat = [ { "role": "user", "content": "Write a hello world program" }, ] prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt") outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150) tokenizer.decode(outputs[0]) ```