pszemraj
/

perSLIMmon-8b-base

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Nov 13, 2023

Commit

9fec9a3

•

1 Parent(s): 8da60ba

Update README.md

Files changed (1) hide show

README.md +47 -2

README.md CHANGED Viewed

@@ -12,6 +12,51 @@ tags:
 > persimmon-8b went to the vocab lipo clinic
-This is a slimmed-down version of [persimmon-8b-base](https://huggingface.co/adept/persimmon-8b-base) that removes the 70,000 unused entries in the model vocab and tokenizer (check out the safetensors layer overview). Should be _slightly_ faster.
-Credit: [fine-tune-fuyu](https://github.com/phillip-kravtsov/fine-tune-fuyu) (`scripts/surgery.py` was adapted for persimmon)

 > persimmon-8b went to the vocab lipo clinic
+A slimmed-down version of [persimmon-8b-base](https://huggingface.co/adept/persimmon-8b-base) which removes the ~70,000 unused entries in the model vocabulary and tokenizer (see the safetensors layer overview). Should be _slightly_ faster.
+Credit: [fine-tune-fuyu](https://github.com/phillip-kravtsov/fine-tune-fuyu) (`scripts/surgery.py` was adapted for persimmon)
+## inference
+install required pkgs:
+```sh
+pip install -U transformers accelerate bitsandbytes sentencepiece
+```
+load in 4bit & run inference:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("pszemraj/perSLIMmon-8b-base")
+model = AutoModelForCausalLM.from_pretrained(
+    "pszemraj/perSLIMmon-8b-base",
+    load_in_4bit=True, # GPU required
+    torch_dtype="auto",
+    device_map="auto",
+)
+inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to(
+    model.device
+)
+tokens = model.generate(
+    **inputs,
+    max_new_tokens=64,
+    temperature=0.75,
+    top_p=0.95,
+    epsilon_cutoff=1e-5,
+    repetition_penalty=1.05,
+    renormalize_logits=True,
+    do_sample=True,
+) # adapt inference params as needed
+print(tokenizer.decode(tokens[0], skip_special_tokens=True))
+```
+inference is decently fast on a colab T4:
+```
+CPU times: user 6.01 s, sys: 138 ms, total: 6.15 s
+Wall time: 6.23 s
+```