ethzanalytics
/

open_llama_13b-sharded-8bit

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions Community

pszemraj commited on Jul 2, 2023

Commit

7e7aa2a

•

1 Parent(s): 5372efc

Update README.md

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -11,4 +11,26 @@ inference: False
 # open_llama_13b-sharded-8bit
-This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details.

 # open_llama_13b-sharded-8bit
+This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details.
+## loading
+```sh
+pip install -U -q sentencepiece transformers accelerate bitsandbytes
+```
+load the model and tokenizer:
+```python
+import torch
+from transformers import LlamaTokenizer, LlamaForCausalLM
+tokenizer = LlamaTokenizer.from_pretrained(model_name, use_fast=False)
+model = LlamaForCausalLM.from_pretrained(
+    model_name,
+    load_in_8bit=True,
+    device_map="auto",
+)
+```