merve
/

gemma-7b-it-8bit

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

merve HF staff commited on Feb 22

Commit

6a78b45

•

1 Parent(s): edd4be3

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -3,3 +3,33 @@ license: other
 license_name: gemma
 license_link: https://ai.google.dev/gemma/prohibited_use_policy
 ---

 license_name: gemma
 license_link: https://ai.google.dev/gemma/prohibited_use_policy
 ---
+# Gemma-7B in 8-bit with bitsandbytes
+This is the repository for [Gemma-7B-it](https://huggingface.co/google/gemma-7b-it) quantized to 8-bit using bitsandbytes.
+Original model card and license for Gemma-7B can be found [here](https://huggingface.co/google/gemma-7b-it#gemma-model-card).
+This is the base model and it's not instruction fine-tuned.
+## Usage
+Please visit original Gemma-7B-it [model card](https://huggingface.co/google/gemma-7b-it#usage-and-limitations) for intended uses and limitations.
+You can use this model like following:
+```python
+from transformers import AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained(
+  "merve/gemma-7b-it-8bit"
+)
+from transformers import AutoTokenizer
+tokenizer =AutoTokenizer.from_pretrained(
+  "google/gemma-7b-it"
+)
+#outputs = model.generate(**input_ids)
+chat = [
+    { "role": "user", "content": "Write a hello world program" },
+]
+prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
+outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
+tokenizer.decode(outputs[0])
+```