pszemraj commited on
Commit
7e7aa2a
1 Parent(s): 5372efc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -1
README.md CHANGED
@@ -11,4 +11,26 @@ inference: False
11
 
12
  # open_llama_13b-sharded-8bit
13
 
14
- This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # open_llama_13b-sharded-8bit
13
 
14
+ This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details.
15
+
16
+
17
+
18
+ ## loading
19
+
20
+ ```sh
21
+ pip install -U -q sentencepiece transformers accelerate bitsandbytes
22
+ ```
23
+
24
+ load the model and tokenizer:
25
+
26
+ ```python
27
+ import torch
28
+ from transformers import LlamaTokenizer, LlamaForCausalLM
29
+
30
+ tokenizer = LlamaTokenizer.from_pretrained(model_name, use_fast=False)
31
+ model = LlamaForCausalLM.from_pretrained(
32
+ model_name,
33
+ load_in_8bit=True,
34
+ device_map="auto",
35
+ )
36
+ ```