lumaticai commited on
Commit
637bce2
·
verified ·
1 Parent(s): 1e7f4fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -115
README.md CHANGED
@@ -52,7 +52,7 @@ We are continuously working on training and developing this model and improve it
52
  - **Shared by [Optional]:** LumaticAI
53
  - **Model type:** Language model
54
  - **Language(s) (NLP):** en, bn
55
- - **License:** apache-2.0
56
  - **Parent Model:** TinyLlama/TinyLlama-1.1B-Chat-v1.0
57
 
58
 
@@ -82,6 +82,120 @@ We are continuously working on training and developing this model and improve it
82
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
83
 
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  # Training Details
86
 
87
  ## Training Data
@@ -215,117 +329,4 @@ lumatic-ai
215
 
216
  # Model Card Contact
217
 
218
- email : contact@lumaticai.com
219
-
220
- # How to Get Started with the Model
221
-
222
- Use the code below to get started with the model.
223
-
224
- <details>
225
- <summary> Click to expand </summary>
226
-
227
- ### Pipeline
228
-
229
- ```
230
- import torch
231
- from transformers import AutoModelForCausalLM, AutoTokenizer
232
- from transformers import pipeline
233
-
234
- def formatted_prompt(question)-> str:
235
- return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
236
-
237
- hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
238
-
239
- tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
240
- pipe = pipeline(
241
- "text-generation",
242
- model=hub_model_name,
243
- torch_dtype=torch.float16,
244
- device_map="auto",
245
- )
246
-
247
- from time import perf_counter
248
- start_time = perf_counter()
249
-
250
- prompt = formatted_prompt('হ্যালো')
251
- sequences = pipe(
252
- prompt,
253
- do_sample=True,
254
- temperature=0.1,
255
- top_p=0.9,
256
- num_return_sequences=1,
257
- eos_token_id=tokenizer.eos_token_id,
258
- max_new_tokens=256
259
- )
260
- for seq in sequences:
261
- print(f"Result: {seq['generated_text']}")
262
-
263
- output_time = perf_counter() - start_time
264
- print(f"Time taken for inference: {round(output_time,2)} seconds")
265
- ```
266
-
267
- ### Streaming Response (ChatGPT, Bard like)
268
-
269
- ```
270
- import torch
271
- from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
272
-
273
- def formatted_prompt(question)-> str:
274
- return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
275
-
276
- hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
277
-
278
- tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
279
- model = AutoModelForCausalLM.from_pretrained(hub_model_name)
280
-
281
- prompt = formatted_prompt('prompt here')
282
- inputs = tokenizer([prompt], return_tensors="pt")
283
- streamer = TextStreamer(tokenizer)
284
- _ = model.generate(**inputs, eos_token_id=[tokenizer.eos_token_id],streamer=streamer, max_new_tokens=256)
285
- ```
286
-
287
- ### Using Generation Config
288
-
289
- ```
290
- import torch
291
- from transformers import GenerationConfig
292
- from time import perf_counter
293
-
294
- def formatted_prompt(question)-> str:
295
- return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
296
-
297
- hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
298
-
299
- tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
300
- model = AutoModelForCausalLM.from_pretrained(hub_model_name)
301
-
302
- prompt = formatted_prompt('হ্যালো')
303
-
304
- # Check for GPU availability
305
- if torch.cuda.is_available():
306
- device = "cuda"
307
- else:
308
- device = "cpu"
309
-
310
- # Move model and inputs to the GPU (if available)
311
- model.to(device)
312
- inputs = tokenizer(prompt, return_tensors="pt").to(device)
313
-
314
- generation_config = GenerationConfig(
315
- penalty_alpha=0.6,
316
- do_sample=True,
317
- top_k=5,
318
- temperature=0.5,
319
- repetition_penalty=1.2,
320
- max_new_tokens=256,
321
- pad_token_id=tokenizer.eos_token_id
322
- )
323
-
324
- start_time = perf_counter()
325
- outputs = model.generate(**inputs, generation_config=generation_config)
326
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
327
- output_time = perf_counter() - start_time
328
- print(f"Time taken for inference: {round(output_time, 2)} seconds")
329
- ```
330
-
331
- </details>
 
52
  - **Shared by [Optional]:** LumaticAI
53
  - **Model type:** Language model
54
  - **Language(s) (NLP):** en, bn
55
+ - **License:** mit
56
  - **Parent Model:** TinyLlama/TinyLlama-1.1B-Chat-v1.0
57
 
58
 
 
82
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
83
 
84
 
85
+ # How to Get Started with the Model
86
+
87
+ Use the code below to get started with the model.
88
+
89
+ <details>
90
+ <summary> Click to expand </summary>
91
+
92
+ ### Pipeline
93
+
94
+ ```
95
+ import torch
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+ from transformers import pipeline
98
+
99
+ def formatted_prompt(question)-> str:
100
+ return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
101
+
102
+ hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
103
+
104
+ tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
105
+ pipe = pipeline(
106
+ "text-generation",
107
+ model=hub_model_name,
108
+ torch_dtype=torch.float16,
109
+ device_map="auto",
110
+ )
111
+
112
+ from time import perf_counter
113
+ start_time = perf_counter()
114
+
115
+ prompt = formatted_prompt('হ্যালো')
116
+ sequences = pipe(
117
+ prompt,
118
+ do_sample=True,
119
+ temperature=0.1,
120
+ top_p=0.9,
121
+ num_return_sequences=1,
122
+ eos_token_id=tokenizer.eos_token_id,
123
+ max_new_tokens=256
124
+ )
125
+ for seq in sequences:
126
+ print(f"Result: {seq['generated_text']}")
127
+
128
+ output_time = perf_counter() - start_time
129
+ print(f"Time taken for inference: {round(output_time,2)} seconds")
130
+ ```
131
+
132
+ ### Streaming Response (ChatGPT, Bard like)
133
+
134
+ ```
135
+ import torch
136
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
137
+
138
+ def formatted_prompt(question)-> str:
139
+ return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
140
+
141
+ hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
142
+
143
+ tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
144
+ model = AutoModelForCausalLM.from_pretrained(hub_model_name)
145
+
146
+ prompt = formatted_prompt('prompt here')
147
+ inputs = tokenizer([prompt], return_tensors="pt")
148
+ streamer = TextStreamer(tokenizer)
149
+ _ = model.generate(**inputs, eos_token_id=[tokenizer.eos_token_id],streamer=streamer, max_new_tokens=256)
150
+ ```
151
+
152
+ ### Using Generation Config
153
+
154
+ ```
155
+ import torch
156
+ from transformers import GenerationConfig
157
+ from time import perf_counter
158
+
159
+ def formatted_prompt(question)-> str:
160
+ return f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant:"
161
+
162
+ hub_model_name = "lumatic-ai/BongLlama-1.1B-Chat-alpha-v0"
163
+
164
+ tokenizer = AutoTokenizer.from_pretrained(hub_model_name)
165
+ model = AutoModelForCausalLM.from_pretrained(hub_model_name)
166
+
167
+ prompt = formatted_prompt('হ্যালো')
168
+
169
+ # Check for GPU availability
170
+ if torch.cuda.is_available():
171
+ device = "cuda"
172
+ else:
173
+ device = "cpu"
174
+
175
+ # Move model and inputs to the GPU (if available)
176
+ model.to(device)
177
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
178
+
179
+ generation_config = GenerationConfig(
180
+ penalty_alpha=0.6,
181
+ do_sample=True,
182
+ top_k=5,
183
+ temperature=0.5,
184
+ repetition_penalty=1.2,
185
+ max_new_tokens=256,
186
+ pad_token_id=tokenizer.eos_token_id
187
+ )
188
+
189
+ start_time = perf_counter()
190
+ outputs = model.generate(**inputs, generation_config=generation_config)
191
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
192
+ output_time = perf_counter() - start_time
193
+ print(f"Time taken for inference: {round(output_time, 2)} seconds")
194
+ ```
195
+
196
+ </details>
197
+
198
+
199
  # Training Details
200
 
201
  ## Training Data
 
329
 
330
  # Model Card Contact
331
 
332
+ email : contact@lumaticai.com