juewang commited on
Commit
80d8529
1 Parent(s): 2720a05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -6
README.md CHANGED
@@ -18,7 +18,10 @@ Together partnered with LAION and Ontocord.ai, who both helped curate the datase
18
  You can read more about this process and the availability of this dataset in LAION’s blog post [here](https://laion.ai/blog/oig-dataset/).
19
 
20
  In addition to the aforementioned fine-tuning, Pythia-Chat-Base-7B-v0.16 has also undergone further fine-tuning via a small amount of feedback data.
21
- This allows the model to better adapt to human preferences in the conversations.
 
 
 
22
 
23
  ## Model Details
24
  - **Developed by**: Together Computer.
@@ -30,18 +33,59 @@ This allows the model to better adapt to human preferences in the conversations.
30
 
31
  # Quick Start
32
 
 
 
 
33
  ```python
34
- from transformers import pipeline
35
- pipe = pipeline(model='togethercomputer/Pythia-Chat-Base-7B-v0.16')
36
- pipe('''<human>: Hello!\n<bot>:''')
 
 
 
 
 
 
 
 
 
37
  ```
38
- or
 
 
 
39
  ```python
40
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
41
  tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
42
- model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
 
 
 
 
 
 
43
  ```
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ## Strengths of the model
46
 
47
  There are several tasks that OpenChatKit excels at out of the box. This includes:
 
18
  You can read more about this process and the availability of this dataset in LAION’s blog post [here](https://laion.ai/blog/oig-dataset/).
19
 
20
  In addition to the aforementioned fine-tuning, Pythia-Chat-Base-7B-v0.16 has also undergone further fine-tuning via a small amount of feedback data.
21
+ This process allows the model to better adapt to human preferences in the conversations.
22
+
23
+ One of the notable features of Pythia-Chat-Base-7B-v0.16 is its ability to **run inference on a 12GB GPU**, thanks to the quantization technique.
24
+ This makes the model not only highly accurate and efficient but also accessible to a wider range of users and hardware configurations.
25
 
26
  ## Model Details
27
  - **Developed by**: Together Computer.
 
33
 
34
  # Quick Start
35
 
36
+ ## GPU Inference
37
+
38
+ This requires a GPU with 16GB memory.
39
  ```python
40
+ from transformers import AutoTokenizer, AutoModelForCausalLM
41
+
42
+ # init
43
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
44
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", torch_dtype=torch.float16)
45
+ model = model.to('cuda:0')
46
+
47
+ # infer
48
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
49
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
50
+ output_str = tokenizer.decode(outputs[0])
51
+ print(output_str)
52
  ```
53
+
54
+ ## GPU Inference in Int8
55
+
56
+ This requires a GPU with 12GB memory.
57
  ```python
58
  from transformers import AutoTokenizer, AutoModelForCausalLM
59
+
60
+ # init
61
  tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
62
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", device_map="auto", load_in_8bit=True)
63
+
64
+ # infer
65
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
66
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
67
+ output_str = tokenizer.decode(outputs[0])
68
+ print(output_str)
69
  ```
70
 
71
+
72
+ ## CPU Inference
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForCausalLM
76
+
77
+ # init
78
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16")
79
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Pythia-Chat-Base-7B-v0.16", torch_dtype=torch.bfloat16)
80
+
81
+ # infer
82
+ inputs = tokenizer("<human>: Hello!\n<bot>:", return_tensors='pt').to(model.device)
83
+ outputs = model.generate(**inputs, max_new_tokens=10, do_sample=True, temperature=0.8)
84
+ output_str = tokenizer.decode(outputs[0])
85
+ print(output_str)
86
+ ```
87
+
88
+
89
  ## Strengths of the model
90
 
91
  There are several tasks that OpenChatKit excels at out of the box. This includes: