mrm8488 commited on
Commit
ab9c04b
1 Parent(s): a846ec3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md CHANGED
@@ -53,6 +53,99 @@ The models are made available under a non-commercial CC BY-NC 4.0 license. More
53
 
54
  The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural langauge interface for different tasks. See the README.md for more information. See the paper for full information on the training data.
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Performance and Limitations
57
 
58
  The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge-intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models. That being said, we note a number of limitations in this section.
 
53
 
54
  The GALACTICA models are trained on 106 billion tokens of open-access scientific text and data. This includes papers, textbooks, scientific websites, encyclopedias, reference material, knowledge bases, and more. We tokenize different modalities to provide a natural langauge interface for different tasks. See the README.md for more information. See the paper for full information on the training data.
55
 
56
+ ## How to use
57
+
58
+ Find below some example scripts on how to use the model in `transformers`:
59
+
60
+ ## Using the Pytorch model
61
+
62
+ ### Running the model on a CPU
63
+
64
+ <details>
65
+ <summary> Click to expand </summary>
66
+
67
+ ```python
68
+
69
+ from transformers import AutoTokenizer, OPTForCausalLM
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-125m")
72
+ model = OPTForCausalLM.from_pretrained("facebook/galactica-125m")
73
+
74
+ input_text = "The Transformer architecture [START_REF]"
75
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
76
+
77
+ outputs = model.generate(input_ids)
78
+ print(tokenizer.decode(outputs[0]))
79
+ ```
80
+
81
+ </details>
82
+
83
+ ### Running the model on a GPU
84
+
85
+ <details>
86
+ <summary> Click to expand </summary>
87
+
88
+ ```python
89
+ # pip install accelerate
90
+ from transformers import AutoTokenizer, OPTForCausalLM
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-125m")
93
+ OPTForCausalLM.from_pretrained("facebook/galactica-125m", device_map="auto")
94
+
95
+ input_text = "The Transformer architecture [START_REF]"
96
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
97
+
98
+ outputs = model.generate(input_ids)
99
+ print(tokenizer.decode(outputs[0]))
100
+ ```
101
+
102
+ </details>
103
+
104
+ ### Running the model on a GPU using different precisions
105
+
106
+ #### FP16
107
+
108
+ <details>
109
+ <summary> Click to expand </summary>
110
+
111
+ ```python
112
+ # pip install accelerate
113
+ import torch
114
+ from transformers import AutoTokenizer, OPTForCausalLM
115
+
116
+ tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-125m")
117
+ model = OPTForCausalLM.from_pretrained("facebook/galactica-125m", device_map="auto", torch_dtype=torch.float16)
118
+
119
+ input_text = "The Transformer architecture [START_REF]"
120
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
121
+
122
+ outputs = model.generate(input_ids)
123
+ print(tokenizer.decode(outputs[0]))
124
+ ```
125
+
126
+ </details>
127
+
128
+ #### INT8
129
+
130
+ <details>
131
+ <summary> Click to expand </summary>
132
+
133
+ ```python
134
+ # pip install bitsandbytes accelerate
135
+ from transformers import AutoTokenizer, OPTForCausalLM
136
+
137
+ tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-125m")
138
+ model = OPTForCausalLM.from_pretrained("facebook/galactica-125m", device_map="auto", load_in_8bit=True)
139
+
140
+ input_text = "The Transformer architecture [START_REF]"
141
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
142
+
143
+ outputs = model.generate(input_ids)
144
+ print(tokenizer.decode(outputs[0]))
145
+ ```
146
+
147
+ </details>
148
+
149
  ## Performance and Limitations
150
 
151
  The model outperforms several existing language models on a range of knowledge probes, reasoning, and knowledge-intensive scientific tasks. This also extends to general NLP tasks, where GALACTICA outperforms other open source general language models. That being said, we note a number of limitations in this section.